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(57) Caracterisation et presentation d'une nouvelle 
sequence genomique specifique pour le tegument. Les 
regions regulatrices voisines de l'ADN ont egalement ete 
car acteri sees. Le peroxydase de tegument est traduit sous 
forme de proteine precurseur de 38 kDa, a 352 acides 
amines, renfermant une sequence-signal de 26 acides 
amines; elle donne, par clivage, une proteine de 35 kDa. 
Les plantes renfermant un allele Ep dominant 
accumulent de grandes quantites de peroxydase dans les 
cellules sabliers du subepiderme. Les genotypes epep 
homozygotes recessifs n' accumulent pas de peroxydase 
dans ces cellules et leur part dans l'activite totale de la 
peroxydase du tegument se trouve sensiblement reduite. 
Les sondes derivees de l'ADNc ou de l'ADN genomique 
peuvent servir a deceler les polymorphismes qui 
distinguent les genotypes EpEp et epep. La 
cosegregation des polymorphismes dans une population 
F2 provenant d'un croisement de plantes EpEp et epep 
montre que le locus Ep code la proteine peroxydase. Une 
comparaison des alleles Ep et ep revele qu'il manque 87 
bp dans le gene recessif pour le codon initial de 
traduction. L 'expression heterologue ainsi que les 
vecteurs et les hotes utilises pour l'expression de la 
peroxydase du tegument sont egalement present.es. La 
region regulatrice de l'ADN specifique pour la semence 
peut servir a controler Fexpression i) de certains genes, 
comme ceux codant la resistance aux herbicides, ii) de 
proteines virales du tegument, protegeant contre 
Tinfection, iii) de proteines a interet commercial (p. ex. 
en pharmacie), iv) de proteines modifiant la valeur 
nutritive, le gout ou le conditionnement des semences; 
enfm, elle peut servir a v) eliminer biologiquement des 
insectes ou des agents pathogenes (p. ex. B. 
thuringiensis). 



(57) A novel seed coat specific peroxidase genomic 
sequence is characterized and presented. Adjacent DNA 
regulatory regions have also been characterized. The 
seed coat peroxidase is translated as a 352 amino acid 
precursor protein of 38 kDa comprising a 26 amino acid 
signal sequence which when cleaved results in a 35 kDa 
protein. Plants containing a dominant Ep allele 
accumulate large amounts of peroxidase in the hourglass 
cells of the subepidermis. Homozygous recessive epep 
genotypes do not accumulate peroxidase in the hourglass 
cells and are much reduced in total seed coat peroxidase 
activity. Probes derived from the cDNA, or genomic 
DNA can be used to detect polymorphisms that 
distinguished EpEp and epep genotypes. Cosegregation 
of the polymorphisms in an ¥2 population from a cross of 
EpEp and epep plants shows that the Ep locus encodes 
the seed coat peroxidase protein. Comparison of Ep and 
ep alleles indicates that the recessive gene lacks 87 bp of 
sequence encompassing the translation start codon. The 
heterologous expression, as well as vectors and hosts to 
be used for the expression of the seed coat peroxidase, 
are also disclosed. The seed-specific DNA regulatory 
region may be used to control expression of genes of 
interest such as i) genes encoding herbicide resistance, or 
ii) biological control of insects or pathogens (e.g. B. 
thuringiensis), or iii) viral coat proteins to protect against 
viral infections, or iv) proteins of commercial interest 
(e.g. pharmaceutical), and v) proteins that alter the 
nutritive value, taste, or processing of seeds. 
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ABSTRACT OF THE DISCLOSURE 

A novel seed coat specific peroxidase genomic sequence is characterized and 
presented. Adjacent DNA regulatory regions have also been characterized. The seed 
coat peroxidase is translated as a 352 amino acid precursor protein of 38 kDa 
comprising a 26 amino acid signal sequence which when cleaved results in a 35 kDa 
protein. Plants containing a dominant Ep allele accumulate large amounts of 
peroxidase in the hourglass cells of the subepidermis. Homozygous recessive epep 
genotypes do not accumulate peroxidase in the hourglass cells and are much reduced 
in total seed coat peroxidase activity. Probes derived from the cDNA, or genomic 
DNA can be used to detect polymorphisms that distinguished EpEp and epep 
genotypes. Cosegregation of the polymorphisms in an F 2 population from a cross of 
EpEp and epep plants shows that the Ep locus encodes the seed coat peroxidase 
protein. Comparison of Ep and ep alleles indicates that the recessive gene lacks 87 bp 
of sequence encompassing the translation start codon. The heterologous expression, 
as well as vectors and hosts to be used for the expression of the seed coat peroxidase, 
are also disclosed. The seed-specific DNA regulatory region may be used to control 
expression of genes of interest such as i) genes encoding herbicide resistance, or ii) 
biological control of insects or pathogens (e.g. B. thuringiensis), or hi) viral coat 
proteins to protect against viral infections, or iv) proteins of commercial interest (e.g. 
pharmaceutical), and v) proteins that alter the nutritive value, taste, or processing of 
seeds. 
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Seed coat specific DNA regulatory region and peroxidase 
The present invention relates to a novel DNA molecule comprising a plant seed 
coat specific DNA regulatory region and a novel structural gene encoding a peroxidase. 
The seed-coat specific DNA regulatory region may also be used to control the 
expression of other genes of interest within the seed coat. 

5 

BACKGROUND OF THE INVENTION 

Full citations for references appear at the end of the Examples section. 

10 Peroxidases are enzymes catalyzing oxidative reactions that use H 2 0 2 as an 

electron acceptor. These enzymes are widespread and occur ubiquitously in plants as 
isozymes that may be distinguished by their isoelectric points. Plant peroxidases 
contribute to the structural integrity of cell walls by functioning in lignin biosynthesis 
and suberization, and by forming covalent cross-linkages between extension, cellulose, 

15 pectin and other cell wall constituents (Campa, 1991). Peroxidases are also associated 
with plant defence responses and resistance to pathogens (Bowles, 1990; 
Moerschbacher 1992). Soybeans contain 3 anionic isozymes of peroxidase with a 
minimum M r of 37 kDa (Sessa and Anderson, 1981). Recently one peroxidase 
isozyme, localised within the seed coat of soybean, has been characterized with a M r 

20 of 37 kDa (Gillikin and Graham, 1991). 
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In an analysis of soybean seeds, Buttery and Buzzell (1968) showed that the 
amount of peroxidase activity present in seed coats may vary substantially among 
different cultivars. The presence of a single dominant gene Ep causes a high seed coat 
peroxidase phenotype (Buzzell and Buttery, 1969). Homozygous recessive epep plants 
are -100-fold lower in seed coat peroxidase activity. This results from a reduction in 
5 the amount of peroxidase enzyme present, primarily in the hourglass cells of the 
subepidermis (Gijzen et aL, 1993). In plants carrying the Ep gene, peroxidase is 
heavily concentrated in the hourglass cells (osteosclereids). These cells form a hi ghl y 
differentiated cell layer with thick, elongated secondary walls and large intercellular 
spaces (Baker et ai, 1987). Hourglass cells develop between the epidermal 

1 0 macrosclereids and the underlying articulated parenchyma, and are a prominent feature 
of seed coat anatomy at full maturity. The cytoplasm exudes from the hourglass cells 
upon imbibition with water and a distinct peroxidase isozyme constitutes five to 10% 
of the total soluble protein in EpEp seed coats. It is not known why the hourglass cells 
accumulate large amounts of peroxidase, but the sheer abundance and relative purity 

15 of the enzyme in soybean seed coats is significant because peroxidases are versatile 
enzymes with many commercial and industrial applications. Studies of soybean seed 
coat peroxidase have shown this enzyme to have useful catalytic properties and a high 
degree of thermal stability even at extremes of pH (McEldoon et ai , 1995). These 
properties result in the preferred use of soybean peroxidase, over that of horseradish 

20 peroxidase, in diagnostic assays as an enzyme label for antigens, antibodies, 
oligonucleotide probes, and within staining techniques. Johnson et al report on the use 
of soybean peroxidase for the deinking of printed waste paper (U.S. 5,270,770; 
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December 6, 1994) and for the biocatalytic oxidation of primary alcohols (U.S. 
5,391,488; February 13, 1996). Soybean peroxidase has also been used as a 
replacement for chlorine in the pulp and paper industry, or as formaldehyde 
replacement (Freiberg, 1995). 

An anionic soybean peroxidase from seed coats has been purified (Gillikin and 
Graham, 1991). This protein has a pi of 4.1 and M, of 37 kDa. A method for the 
bulk extraction of peroxidase from seed hulls of soybean using a freeze thaw technique 
has also been reported (U.S. 5,491,085, February 13, 1996, Pokara and Johnson). 

I^grimini et al (1987) disclose the cloning of a ubiquitous anionic peroxidase 
in tobacco encoding a protein of M r of 36 kDa. This peroxidase has also been over 
expressed in transgenic tobacco plants (Lagrimini et al 1990) and Maliyakal discloses 
the expression of this gene in cotton (WO 95/08914). 

Huangpu et al (1995) reported the partial cloning of a soybean anionic seed coat 
peroxidase. The 1031 bp sequence contained an open reading frame of 849 bp 
encoding a 283 amino acid protein with a Mr of 30,577. The M, of this peroxidase is 
7 kDa less than what one would expect for a soybean seed coat peroxidase as reported 
by Gillikin and Graham (1991) and possibly represents another peroxidase isozyme 
within the seed coat. 
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The upstream promoter sequences for two poplar peroxidases have been 
described by Osakabe et al (1995). A number of characteristic regulatory sites were 
identified from comparison of these sequences to existing promoter elements. 
Additionally, a cryptic promoter with apparent specificity for seed coat tissues was 
isolated from tobacco by a promoter trapping strategy (Fobert et al. 1994). The 
upstream regulatory sequences associated with the Ep gene in soybean are distinct from 
these and other previously characterized promoters. The soybean Ep promoter drives 
high-level expression in a cell and tissue specific manner. The peroxidase protein 
encoded by the Ep gene accumulates in the seed coat tissues, especially in the hour 
glass cells of the subepidermis. Minimal expression of the gene is detected in root 
tissues. 

One problem arising from the desired use of soybean seed coat peroxidase is 
that there is variability between soybean varieties regarding peroxidase production 
(Buttery and Buzzell, 1986; Freiberg, 1995). Due to the commercial interest in the use 
of soybean seed coat peroxidase new methods of producing this enzyme are required. 
Therefore, the gene responsible for the expression of the 37 kDa isozyme in soybean 
seed coat was isolated and characterized. 

Furthermore, novel regulatory regions obtained from the genomic DNA of 
soybean seed coat peroxidase have been isolated and characterized and are useful in 
directing the expression of genes of interest in seed coat tissues. 
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SUMMARY OF THE INVENTION 

The present invention relates to a DNA molecule that encodes a soybean seed 
coat peroxidase and associated DNA regulatory regions. 

This invention also embraces isolated DNA molecules comprising the nucleotide 
sequence of either SEQ ID NO: 1 (the cDNA encoding soybean seed coat peroxidase) 
SEQ ID No:2 (the genomic sequence). 

This invention also provides for a chimeric DNA molecule comprising a seed 
coat-specific regulatory region having nucleotides 1-1532 of SEQ ID NO:2 and a gene 
of interest under control of this DNA regulatory region. Also included within this 
invention are chimeric DNA molecules comprising genomic DNA sequences 
exemplified by nucleotides 412-1041, 1234-2263 or 2430-2691 of SEQ ID NO:2. 
Furthermore, this invention is directed to isolated DNA molecules comprising at least 

1) 24 contiguous nucleotides selected from nucleotides 1-1532 of SEQ ID 
NO:2; 

2) 32 contiguous nucleotides selected from nucleotides 412-1041 of SEQ 
ID NO:2; 

3) 23 contiguous nucleotides selected from nucleotides 1234-2263 of SEQ 
IDNO:2;or 

4) 22 contiguous nucleotides selected from nucleotides 2430-2691 of SEQ 
ID NO:2. 
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The present invention also provides for vectors which comprise DNA molecules 
encoding soybean seed coat peroxidase. Such a construct may include the DNA 
regulatory region from SEQ ID NO:2, including nucleotides 1-1532, or at least 24 
contiguous nucleotides selected from nucleotides 1-1532 of SEQ ED NO:2 in 
conjunction with the seed coat peroxidase gene, or the seed coat peroxidase gene under 
the control of any suitable constitutive or inducible promoter of interest. 

This invention is also directed towards vectors which comprise a gene of 
interest placed under the control of a DNA regulatory element derived from the 
genomic sequence encoding soybean seed coat peroxidase. Such a regulatory element 
includes nucleotides 1-1532 of SEQ ID NO:2, or at least 24 contiguous nucleotides 
selected from nucleotides 1-1532 of SEQ ID NO:2. Elements comprising nucleotides 
412-1041, 1234-2263 or 2430-2691 of SEQ ID NO:2, or 32 contiguous nucleotides 
selected from nucleotides 412-1041 of SEQ ID NO:2, 23 contiguous nucleotides 
selected from nucleotides 1234-2263 of SEQ ED NO:2, or 22 contiguous nucleotides 
selected from nucleotides 2430-2691 of SEQ ID NO:2 may also be used. 

This invention also embraces prokaryotic and eukaryotic cells comprising the 
vectors identified above. Such cells may include bacterial, insect, mammalian, and 
plant cell cultures. 

This invention also provides for transgenic plants comprising the seed coat 
peroxidase gene under control of constitutive or inducible promoters. Furthermore, 
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this invention also relates to transgenic plants comprising the DNA regulatory regions 
of nucleotides 1-1532 of SEQ ID NO:2 controlling a gene of interest, or comprising 
genes of interest in functional association with genomic DNA sequences exemplified 
by nucleotides 412-1041, 1234-2263 or 2430-2691 of SEQ ID NO:2. Also embraced 
by this invention are transgenic plants having regulatory regions comprising at least 24 
contiguous nucleotides selected from nucleotides 1-1532 of SEQ ID NO:2, 32 
contiguous nucleotides selected from nucleotides 412-1041 of SEQ ID NO:2, 23 
contiguous nucleotides selected from nucleotides 1234-2263 of SEQ ID NO:2, or 22 
contiguous nucleotides selected from nucleotides 2430-2691 of SEQ ID NO:2. 

This invention is also directed to a method for the production of soybean seed 
coat peroxidase in a host cell comprising: 

i) transforming the host cell with a vector comprising an oligonucleotide 
sequence that encodes soybean seed coat peroxidase; and 

ii) culturing the host cell under conditions to allow expression of the 
soybean seed coat peroxidase. 

This invention also provides for a process for producing a heterologous gene 
of interest within seed coats of a transformed plant, comprising propagating a plant 
transformed with a vector comprising a gene of interest under the control of 
nucleotides 1-1532 of SEQ ID NO:2. Furthermore, this invention embraces a process 
for producing a heterologous gene of interest within seed coats of a transformed plant, 
comprising propagating a plant transformed with a vector comprising a gene of interest 
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under the control of a regulatory region comprising at least 24 nucleotides selected 
from nucleotides 1-1532 of SEQ ID NO:2. 

Although the present invention is exemplified by a soybean seed coat peroxidase 
and adjacent DNA regulatory regions, in practice any gene of interest can be placed 
downstream from the DNA regulatory region for seed coat specific expression. 
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BRBEF DESCRIPTION OF THE DRAWINGS 

These and other features of the invention will become more apparent from the 
following description in which reference is made to the appended drawings wherein: 

Figure 1 is the cDNA and deduced amino acid sequence of soybean seed coat 
peroxidase. Nucleotides are numbered by assigning 4-1 to the first base of the 
ATG start codon; amino acids are numbered by assigning + 1 to the N-tenninal 
Gin residue after cleavage of the putative signal sequence. The N-terminal 
signal sequence, the region of the active site, and the heme-binding domain are 
underlined. The numerals I, E and HI placed directly above single nucleotide 
gaps in the sequence indicate the three intron splice positions. The target site 
and direction of five different PCR primers are shown with dotted lines above 
the nucleotide sequence. An asterisk (*) marks the translation stop codon. 

Figure 2 is the genomic DNA sequence of the Soybean seed coat peroxidase. 

Figure 3 is a comparison of soybean seed coat peroxidase with other closely related 
plant peroxidases. The GenBank accession numbers are provided next to the 
name of the plant from which the peroxidase was isolated. The accession 
number for the soybean sequence is L78163. (A) A comparison of the nucleic 
acid sequences; (B) A comparison of the amino acid sequences. 
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Figure 4 is a restriction fragment length polymorphisms between EpEp and epep 
genotypes using the seed coat peroxidase cDNA as probe. Genomic DNA of 
soybean lines 0X312 {epep) and 0X347 (EpEp) was digested with restriction 
enzyme, separated by electrophoresis in a 0.5% agarose gel, transferred to 
nylon, and hybridized with ^P-labelled cDNA encoding the seed coat 
peroxidase. The size of the hybridizing fragments was estimated by comparison 
to standards and is indicated on the right. 

Figure 5 exhibits the structure of the Ep Locus. A 17 kb fragment including the Ep 
locus is illustrated schematically. A 3.3 kb portion of the gene is enlarged and 
exons and introns are represented by shaded and open boxes, respectively. The 
final enlargement of the 5' region shows the location and DNA sequence 
around the 87 bp deletion occurring in the ep allele of soybean line 0X312. 
Nucleotides are numbered by assigning + 1 to the first base of the ATG start 
codon. 

Figure 6 displays PCR analysis of EpEp and epep genotypes using primers derived 
from the seed coat peroxidase cDNA. Genomic DNA from soybean lines 
0X312 (epep) and 0X347 (EpEp) was used as template for PCR analysis with 
four different primer sets. Amplification products were separated by 
electrophoresis through a 0.8% agarose gel and visualized under UV light after 
staining with ethidium bromide. Genotype and primer combinations are 
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indicated at the top of the figure. The size in base pairs of the amplified DNA 
fragments are indicated on the right. 

Figure 7 exhibits PCR analysis of an F2 population from a cross of EpEp and epep 
genotypes. Genomic DNA was used as template for PCR analysis of the 
parents (P) and 30 F 2 individuals. The cross was derived from the soybean lines 
0X312 (epep) and 0X347 (EpEp). Plants were self pollinated and seeds were 
collected and scored for seed coat peroxidase activity. The symbols (-) and (+) 
indicate low and high seed coat peroxidase activity, respectively. Primers 
prx9+ and prxlO- were used in the amplification reactions. Products were 
separated by electrophoresis through a 0.8% agarose gel and visualized under 
UV light after staining with ethidium bromide. The migration of molecular 
markers and their corresponding size in kb is also shown (lanes M). 

Figure 8 displays PCR analysis of six different soybean cultivars with primers derived 
from the seed coat peroxidase cDNA sequence. Genomic DNA was used as 
template for PCR analysis of three EpEp cultivars and three epep cultivars. 
Primers used in the amplification reactions and the size of the DNA product is 
indicated on the left. Products were separated by electrophoresis through a 
0.8% agarose gel and visualized under UV light after staining with ethidium 
bromide. 

(A) Forward and reverse primers are downstream from deletion 

(B) Forward primer anneals to site within deletion 
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(C) Primers span deletion 
figure 9 shows the accumulation of peroxidase RNA in tissues of GEp and epep 
plants. Figure 9(A): A comparison of peroxidase transcript abundance 
in cultivars Harosoy 63 (Ep) or Marathon (ep). Seed and pod tissues 
were sampled at a late stage of development corresponding to a whole 
5 seed fresh weight of 250 mg. Root and leaf tissue was from six week 

old plants. Autoradiograph exposed for 96 h. Figure 9(B): 
Developmental expression of peroxidase in cultivar Harosoy 63 (Ep). 
Flowers were sampled immediately after opening. Seed coat tissues 
were sampled at four stages of development corresponding to a whole 
10 seed fresh weight of: lane 1, 50 mg; lane 2, 100 mg; lane 3, 200 mg; 

lane 4, 250 mg. Autoradiograph exposed for 20 h. 



CA 02211018 1997-09-19 



- 13 - 

DESCRIPTION OF PREFERRED EMBODIMENT 

The present invention is directed to a novel oligonucleotide sequence encoding 
a seed coat peroxidase and associated DNA regulatory regions. 

According to the present invention DNA sequences that are "substantially 
5 homologous" includes sequences that are identified under conditions of high 
stringency. "High stringency" refers to Southern hybridization conditions employing 
washes at 65°C with 0.1 x SSC, 0.5 % SDS. 

By "DNA regulatory region" it is meant any region within a genomic sequence 
that has the property of controlling the expression of a DNA sequence that is operably 
linked with the regulatory region. Such regulatory regions may include promoter or 
enhancer regions, and other regulatory elements recognized by one of skill in the art. 
A segment of the DNA regulatory region is exemplified in this invention, however, as 
is understood by one of skill in the art, this region may be used as a probe to identify 
surrounding regions involved in the regulation of adjacent DNA, and such surrounding 
regions are also included within the scope of this invention. 

In the context of this disclosure, the term "promoter" or "promoter region" 
refers to a sequence of DNA, usually upstream (5*) to the coding sequence of a 
20 structural gene, which controls the expression of the coding region by providing the 
recognition for RNA polymerase and/or other factors required for transcription to start 
at the correct site. 



10 



15 



I 
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There are generally two types of promoters, inducible and constitutive. An 
"inducible promoter" is a promoter that is capable of directly or indirectly activating 
transcription of one or more DNA sequences or genes in response to an inducer. In 
the absence of an inducer the DNA sequences or genes will not be transcribed. 
Typically the protein factor, that binds specifically to an inducible promoter to activate 
5 transcription, is present in an inactive form which is then directly or indirectly 
converted to the active form by the inducer. The inducer can be a chemical agent such 
as a protein, metabolite, growth regulator, herbicide or phenolic compound or a 
physiological stress imposed direcdy by heat, cold, salt, or toxic elements or indirecdy 
through the action of a pathogen or disease agent such as a virus. A plant cell 
10 containing an inducible promoter may be exposed to an inducer by externally applying 
the inducer to the cell or plant such as by spraying, watering, heating or similar 
methods. 

By "constitutive promoter" it is meant a promoter that directs the expression 
15 of a gene throughout the various parts of a plant and continuously throughout plant 
development. Examples of known constitutive promoters include those associated with 
the CaMV 35S transcript and Agrobacterium Ti plasmid nopaline synthase gene. 

The chimeric gene constructs of the present invention can further comprise a 
20 3' untranslated region. A 3' untranslated region refers to that portion of a gene 
comprising a DNA segment that contains a polyadenylation signal and any other 
regulatory signals capable of effecting mRNA processing or gene expression. The 



« 
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polyadenyiation signal is usually characterized by effecting the addition of polyadenylic 
acid tracks to the 3' end of the mRNA precursor. Polyadenyiation signals are 
commonly recognized by the presence of homology to the canonical form 5' AATAAA- 
3' although variations are not uncommon. 

5 Examples of suitable 3' regions are the 3' transcribed non-translated regions 

containing a polyadenyiation signal of Agrobacterium tumour inducing (Ti) plasmid 
genes, such as the nopaline synthase (Nos gene) and plant genes such as the soybean 
storage protein genes and the small subunit of the ribulose-1, 5-bisphosphate 
carboxylase (ssRUBISCO) gene. The 3' untranslated region from the structural gene 

10 of the present construct can therefore be used to construct chimeric genes for 
expression in plants. 

The chimeric gene construct of the present invention can also include further 
enhancers, either translation or transcription enhancers, as may be required. These 

1 5 enhancer regions are well known to persons skilled in the art, and can include the ATG 
initiation codon and adjacent sequences. The initiation codon must be in phase with 
the reading frame of the coding sequence to ensure translation of the entire sequence. 
The translation control signals and initiation codons can be from a variety of origins, 
both natural and synthetic. Translational initiation regions may be provided from the 

20 source of the transcriptional initiation region, of from the structural gene. The 
sequence can also be derived from the promoter selected to express the gene, and can 
be specifically modified so as to increase translation of the mRNA. 
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To aid in identification of transformed plant cells, the constructs of this 
invention may be further manipulated to include plant selectable markers. Useful 
selectable markers include enzymes which provide for resistance to an antibiotic such 
as gentamycin, hygromycin, kanamycin, and the like. Similarly, enzymes providing 
for production of a compound identifiable by colour change such as GUS 
5 (p-glucuronidase), or luminescence, such as luciferase are useful. 

Also considered part of this invention are transgenic plants containing the 
chimeric gene construct of the present invention. Methods of regenerating whole 
plants from plant cells are known in the art, and the method of obtaining transformed 

10 and regenerated plants is not critical to this invention. In general, transformed plant 
cells are cultured in an appropriate medium, which may contain selective agents such 
as antibiotics, where selectable markers are used to facilitate identification of 
transformed plant cells. Once callus forms, shoot formation can be encouraged by 
employing the appropriate plant hormones in accordance with known methods and the 

1 5 shoots transferred to rooting medium for regeneration of plants. The plants may then 
be used to establish repetitive generations, either from seeds or using vegetative 
propagation techniques . 

The constructs of the present invention can be introduced into plant cells using 
20 Ti plasmids, Ri plasmids, plant virus vectors, direct DNA transformation, micro- 
injection, electroporation, etc. For reviews of such techniques see for example 



1 
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Weissbach and Weissbach (1988) and Geierson and Corey (1988). The present 
invention further includes a suitable vector comprising the chimeric gene construct. 

Buttery and Buzzell (1968) showed that the amount of peroxidase activity 
present in seed coats may vary substantially among different cultivars. The presence 
5 of a single dominant gene Ep causes a high seed coat peroxidase phenotype (Buzzell 
and Buttery, 1969). Homozygous recessive epep plants are -100-fold lower in seed 
coat peroxidase activity. This results from a reduction in the amount of peroxidase 
enzyme present, primarily in the hourglass cells of the subepidermis (Gijzen et al., 
1993). In plants carrying the Ep gene, peroxidase is heavily concentrated in the 
10 hourglass cells (osteosclereids). These cells form a highly differentiated cell layer with 
thick, elongated secondary walls and large intercellular spaces (Baker et al , 1987). 

Screening a seed coat cDNA library prepared from EpEp plants with a 
degenerate primer derived from the active site domain of plant peroxidase resulted in 
15 a high frequency of positive clones. Many of these clones encode identical cDNA 
molecules and indicate that the corresponding mRNA is an abundant transcript in 
developing seed coat tissues. The sequence of the cDNA is shown in Figure 1 . 

Previous studies on soybean seed coat peroxidase indicated that this enzyme is 
20 heavily glycosylated and that carbohydrate contributes 18% of the mass of the apo- 
enzyme (Gray et al., 1996). The seven potential glycosylation sites identified from the 
amino acid sequence of the seed cost peroxidase (Figure 1) would accommodate the 
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five or six N-linked glycosylation sites proposed by Gray et aL (1996). The heme- 
binding dotaain encompasses residues Aspl61 to Phel71 and the acid-base catalysis 
region from Gly33 to Cys44. The two regions are highly conserved among plant 
peroxidases and are centred around functional histidine residues, Hisl69 and His40. 
There are eight conserved cysteine residues in the mature protein that provide for four 
5 di-sulfide bridges found in other plant peroxidases and predicted from the crystal 
structure of peanut peroxidase (Welinder, 1992; Schuller et aL, 1996). Other 
conserved areas include residues Cys91 to AlalOS and Vail 19 to Leu 127 that occur in 
or around helix D. The most divergent aspects of the seed coat peroxidase protein 
sequence are the carboxy- and amino-terminal regions. These sequences probably 
10 provide special targeting signals for the proper processing and delivery of the peptide 
chain. It is possible the carboxy-tenninal extension of the seed coat peroxidase is 
removed at maturity, as has been shown for certain barley and horseradish peroxidases 
(Welinder, 1992). 

15 The molecular mass of the enzyme has been determined by denaturing gel 

electrophoresis to be 37 kDa (Sessa and Anderson, 1981; Gillikin and Graham, 1991) 
or 43 kDa (Gijzen et aL, 1993). Analysis by mass spectrometry indicated a mass of 
40,622 Da for the apo-enzyme and 33,250 Da after deglycosylation (Gray et aL, 
1996). These values are in good agreement with the mass of 35,377 Da calculated from 

20 the predicted amino acid sequence for the mature apo-protein prior to glycosylation and 
other modifications. Huangpu et al (1995) reported an anionic seed coat peroxidase 
having a M r of 30,577 Da and characterized a partial cDNA encoding this protein. 
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This 1031 bp cDNA contained an open reading frame of 849 bp encoding a 283 amino 
acid protein. There are several differences between this reported sequence and the 
sequence of this invention that are manifest at the amino acid level (see Figure 3 for 
sequence comparison). The enzyme encoded by the gene reported by Huangpu et al 
is different from that of this invention as the peroxidase of this invention has a M r of 
5 35,377 Da. 

Genomic DNA blots probed with the seed coat peroxidase cDNA produced two 
or three hybridizing fragments of varying intensity with most restriction enzyme 
digestions, despite that several peroxidase isozymes are present in soybean. The results 
10 indicate that this seed coat peroxidase is present as a single gene that does not share 
sufficient homology with most other peroxidase genes to anneal under conditions of 
high stringency. 

The genomic DNA sequence comprises four exons spanning bp 1533-1752 
15 (exon I), 2383 -2574 (exon 2), 3605-3769 (exon 3) and 4033-4516 (exon 4) and three 
introns comprising 1752-2382 (intron 1), 2575-3604 (intron 2) and 3770-4516 (intron 
3), of SEQ ID NO:2. Features of the upstream regulatory region of the genomic DNA 
include a TATA box centred on bp 1487; a cap signal 32 bp down stream centred on 
bp 1520. Also noted within the genomic sequence are three polyadenylation signals 
20 centred on bp 4520, 4598, 4663 and a polyadenylation site at bp 4700. 
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This promoter is considered seed coat specific since the peroxidase protein 
encoded by the Ep gene accumulates in the seed coat tissues, especially in the hourglass 
cells of the subepidermis, and is not expressed in other tissues, aside from a marginal 
expression of peroxidase in the root tissues. This is also true at the transcriptional 
level (see Figure 9). The DNA regulatory regions of the genomic sequence of Figure 
2 are used to control the expression of the adjacent peroxidase gene in seed coat tissue. 
Such regulatory regions include nucleotides 1-1532. Other regions of interest include 
nucleotides 1752-2382, 2575-3604 and/or 3770-4032 of SEQ ID NO:2. Therefore 
other proteins of interest may be expressed in seed coat tissues by placing a gene 
capable of expressing the protein of interest under the control of the DNA regulatory 
elements of this invention. Genes of interest include but are not restricted to herbicide 
resistant genes, genes encoding viral coat proteins, or genes encoding proteins 
conferring biological control of pest or pathogens such as an insecticidal protein for 
example B. thuringiensis toxin. Other genes include those capable of the production 
of proteins that alter the taste of the seed and/or that affect the nutritive value of the 
soybean. 

A modified DNA regulatory sequence may be obtained by introducing changes 
into the natural sequence. Such modifications can be done through techniques known 
to one of skill in the art such as site-directed mutagenesis, reducing the length of the 
regulatory region using endonucleases or exonucleases, increasing the length through 
the insertion of linkers or other sequences of interest. Reducing the size of DNA 
regulatory region may be achieved by removing 3' or 5' regions of the regulatory 
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region of the natural sequence by using a endonuclease such as BAL 3 1 (Sambrook et 
al 1989). However, any such DNA regulatory region must still function as a seed coat 
specific DNA regulatory region. 

It may be readily determined if such modified DNA regulatory elements are 
5 capable of acting in a seed coat specific manner transforming plant cells with such 
regulatory elements controlling the expression of a suitable marker gene, culturing 
these plants and determining the expression of the marker gene within the seed coat as 
outlined above. One may also analyze the efficacy of DNA regulatory elements by 
introducing constructs comprising a DNA regulatory element of interest operably 
1 0 linked with an appropriate marker into seed coat tissues by using particle bombardment 
directed to seed coat tissue and determining the degree of expression of the regulatory 
region as is known to one of skill in the art. 

Two tandemly arranged genes encoding anionic peroxidase expressed in stems 
15 of Populus kitakamiertsis, prxA3a and prxA4a have been cloned and characterized 
(Osakabe et al, 1995). Both of these genomic sequences contained four exons and 
three introns and encoded proteins of 347 and 343 amino acids, respectively. The two 
genes encode distinct isozymes with deduced M£ of 33.9 and 34.6 kDa. 
Furthermore, a 532 bp promoter derived from the peroxidase gene of Armoracia 
20 rusticana has also been reported (Toyobo KK, JP 4,126,088, April 27, 1992). 
However, a search using GenBank revealed no substantial similarity between the 
promoter region, or introns 1 , 2 and 3 of this invention and those within the literature. 
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Digestion of the genomic DNA with BarnHl or Sad revealed restriction 
fragment length polymorphisms that distinguished EpEp and epep genotypes. Although 
the Xbal digestion did not produce a readily detectable polymorphism, the size of the 
hybridizing fragment in both genotypes was - 14 kb. Thus, a 0.3 kb size difference is 
outside of the resolving power of the separation for fragments this large. Sequence 
analysis of EpEp and epep genotypes indicates that the mutant ep allele is missing 87 
bp of sequence at the 5* end of the structural gene. This would account for the 
drastically reduced amounts of peroxidase enzyme present in seed coats of epep plants 
since the deletion includes the translation start codon and the entire N-terminal signal 
sequence. However, the 87 bp deletion cannot account for the differences observed in 
the RFLP analysis since the missing fragment does not include a BamHL site and is 
much smaller than the 0.3 kb polymorphism detected in the Sacl digestion. Thus, 
other genetic rearrangements must occur in the vicinity of the ep locus that lead to 
these polymorphisms. 

The results shown here indicate that the mutation causing low seed coat 
peroxidase activity occurs in the structural gene encoding the enzyme. This mutation 
is an 87 bp deletion in the 5' region of the gene encompassing the translation start site. 
Several different low peroxidase cultivars share a similar mutation in the same area, 
suggesting that the recessive ep alleles have a common origin or that the region is 
prone to spontaneous deletions or rearrangements. 
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Due to the industrial interest in soybean seed coat peroxidase, alternate sources 
for the production of this enzyme are needed. The DNA of this invention, encoding 
the seed coat soybean peroxidase under the control of a suitable promoter and 
expressed within a host of interest, can be used for the preparation of recombinant 
soybean seed coat peroxidase enzyme. 

Soybean seed coat peroxidase has been characterized as a lignin-type peroxidase 
that has industrially significant properties ie: high activity and stability under acidic 
conditions; exhibits wide substrate specificity; equivalent catalytic properties to that 
of Phanerochaete chrysosporium ligin peroxidase (the currently preferred enzyme used 
for treatment of industrial waste waters (Wick 1995) but is at least 150-fold more 
stable; more stable than horseradish peroxidase which is also used in industrial effluent 
treatments and medical diagnostic kits (McEldoon et al. , 1995). These properties are 
useful within industrial applications for the degradation of natural aromatic polymers 
including lignin and coal (McEldoon et al, 1995), and the preferred use of soybean 
peroxidase, over that of horseradish peroxidase, in medical diagnostic tests as an 
enzyme label for antigens, antibodies, oligonucleotide probes, and within staining 
techniques (Wick 1995). Soybean peroxidase is also used in the deinking of printed 
waste paper (Johnson et al., U.S. 5,270,770; December 6, 1994) and for the 
biocatalytic oxidation of primary alcohols (Johnson et al., U.S. 5,391,488; February 
13, 1996). Soybean peroxidase has also been used as a replacement for chlorine in 
the pulp and paper industry, in order to remove chlorine, phenolic or aromatic amine 
containing pollutants from industrial waste waters (Wick 1995), or as formaldehyde 
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replacement (Freiberg, 1995) for use in adhesives, abrasives, and protective coatings 
(e.g. varnish and resins, Wick 1995). 

Furthermore, the seed coat peroxidase gene may be expressed in an organ or 
tissue specific manner within a plant. For example, the quality and strength of cotton 
fibber can be improved through the over-expression of cotton or horseradish peroxidase 
placed under the control of a fibre-specific promoter (Maliyakal, WO 95/08914; April 
6, 1995). 

Similarly, seed-specific DNA regulatory regions of this invention may be used 
to control expression of genes of interest such as: 

i) genes encoding herbicide resistance, or 

ii) biological control of insects or pathogens (e.g. B. thuringiensis), or 

iii) viral coat proteins to protect against viral infections, or 

iv) proteins of commercial interest (e.g. pharmaceutical), and 

v) proteins that alter the nutritive value, taste, or processing of seeds 
within the seed coat of plants. 

While this invention is described in detail with particular reference to preferred 
embodiments thereof, said embodiments are offered to illustrate but not to limit the 
invention. 
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EXAMPLES 

Plant material 

All soybean (Glycine max [LJ Men) cultivars and breeding lines were from the 
collection at Agriculture Canada, Harrow, Ontario. 

Seed Coat cDNA library Construction and Screening 

High seed coat peroxidase (EpEp) soybean cultivar Harosoy 63 plants were 
grown in field plots outdoors. Pods were harvested 35 days after flowering and seeds 
in the mid-to-late developmental stage were excised. The average fresh mass was 250 
mg per seed. Seed coats were dissected and immediately frozen in liquid nitrogen. The 
frozen tissue was lyophilized and total RNA extracted in 100 mM Tris-HCl pH 9.0, 
20 mM EDTA, 4% (w/v) sarkosyl, 200 mM NaCl, and 16 mM DTT, and precipitated 
with LiCl using the standard phenol/chloroform method described by Wang and 
Vodkin (1994). The poly(A) + RNA was purified on oiigo(dT) cellulose columns prior 
to cDNA synthesis, size selection, ligation into the X ZAP Express vector, and 
packaging according to instructions (Stratagene). A degenerate oligonucleotide with the 
5' to 3' sequence of TT(C/T)CA(C/T)GA(C/T)TG(C/T)TT(C/T)GT was 5' end 
labelled to high specific activity and used as a probe to isolate peroxidase cDNA clones 
(Sambrook et al. , 1989). Duplicate plaque lifts were made to nylon filters (Amersham), 
UV fixed, and prehybridized at 36 °C for 3 h in 6 x SSC, 20 mM NajHPC^ (pH6.8), 
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5 x Denhardt's, 0.4 % SDS, and 500 /xg/mL salmon sperm DNA. Hybridization was 
in the same buffer, without Denhardt's, at 36 °C for 16 h. Filters were washed quickly 
with several changes of 6 x SSC and 0.1% SDS, first at room temperature and finally 
at 40°C, prior to autoradiography for 16 h at -70°C with an intensifying screen. 

5 Genomic DNA Isolation, Library Construction, and DNA Blot Analysis 

Soybean genomic DNA was isolated from leaves of greenhouse grown plants 
or from etiolated seedlings grown in vermiculite. Plant tissue was frozen in liquid 
nitrogen and lyophilized before extraction and purification of DNA according to the 

10 method of Dellaporta et al (1983). Restriction enzyme digestion of 30 /zg DNA, 
separation on 0.5 % agarose gels and blotting to nylon membranes followed standard 
protocols (Sambrook et aL t 1989). For construction of the genomic library, DNA 
purified from Harosoy 63 leaf tissue was partially digested with BarriHI and ligated into 
the k FIX II vector (Stratagene). Gigapack XL packaging extract (Stratagene) was used 

15 to select for inserts of 9 to 22 kb. After library amplification, duplicate plaque lifts 
were hybridized to cDNA probe. 

Blots or filter lifts were prehybridized for 2 h at 65°C in 6 x SSC, 5 x 
Denhardt's, 0.5 % SDS, and 100 ftg/mL salmon sperm DNA. Radiolabelled cDNA 
20 probe (20 to 50 ng) was prepared using the Ready-to-Go labelling kit (Pharmacia) and 
32 P-dCTP (Amersham). Unincorporated 52 P-dCTP was removed by spin column 
chromatography before adding radiolabelled cDNA to the hybridization buffer 
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(identical to prehybridization buffer without Denhardt's). Hybridization was for 20 h 
at 65°C. Membranes were washed twice for 15 min at room temperature with 2 x SSC, 
0.5 % SDS, followed by two 30 min washes at 65°C with 0.1 x SSC, 0.5 % SDS. 
Autoradiography was for 20 h at -70°C using an intensifying screen and X-OMAT film 
(Kodak). 

5 

DNA Sequencing 

Sequencing of DNA was performed using dye-labelled tenninators and Taq-FS 
DNA polymerase (Perkin-Elmer). The PCR protocol consisted of 25 cycles of a 30 sec 
10 melt at 96°C, 15 sec annealing at 50°C, and 4 min extension at 60°C. Samples were 
analyzed on an Applied Biosystems 373A Stretch automated DNA sequencer. 

Polymerase Chain Reaction 

15 PCR amplifications contained 1 ng template DNA, 5 pmol each primer, 1.5 

mM MgCl 2 , 0. 15 mM deoxynucleotide triphosphates mix, 10 mM Tris-HCl, 50 mM 
KC1, pH 8.3, and 1 unit of Taq polymerase (Gibco BRL) in a total volume of 25 jzL. 
Reactions were performed in a Perkin-Elmer 480 thermal cycler. After an initial 2 min 
denaturation at 94 °C, there were 35 cycles of 1 min denaturation at 94°C, 1 min 

20 annealing at 52°C, and 2 min extension at 72°C. A final 7 min extension at 72 °C 
completed the program. The following primers were used for PCR analysis of genomic 
DNA: 
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prx2 + CTTCC AAATATC AACTC AAT 

prx6- TAAAGTTGGAAAAGAAAGTA 

prx9 ATGCATGCAGGTTTTTCAGT 

prxlO- TTGCTCGCTTTCTATTGTAT 

prxl2 + TCTTCGATGCTTCTTTCACC 

5 prx29 + C ATAAACAATACGTACGTGAT 

RNA Isolation 

For isolation of RNA, tissue was harvested from greenhouse grown plants, 
1 0 dissected, frozen in liquid nitrogen, and lyophilized prior to extraction. Total RNA was 
purified from seed coats, embryos, pods, leaves, and flowers using standard 
phenol/chloroform method (Sambrook et al., 1989). This method did not afford good 
yields of RNA from roots, therefore this tissue was extracted with Triazole reagent 
(GibcoBRL) and total RNA purified according to manufacturers* instructions with an 
15 additional phenol-chloroform extraction step. The amount of RNA was estimated by 
measuring absorbance at 260 and 280 nm, and by electrophoretic separation in 
formaldehyde gels followed by staining with ethidium bromide and comparison to 
known standards. Total RNA (10 /zg per sample) was prepared, subject to 
electrophoresis through a 1% agarose gel containing formaldehyde, and then stained 
20 with ethidium bromide to ensure equal loading of samples. The gel was blotted to 
nylon (Hybond™N, Amersham) according to standard methods and the RNA was fixed 
to the membrane by UV cross linking. 
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Seed Coat Peroxidase Assays 

The F 3 seed was measured for peroxidase activity to score the phenotype of the 
F 2 population because the seed testa is derived from maternal tissue. The seeds were 
briefly soaked in water and the seed coat was dissected from the embryo and placed in 
5 a vial. Ten drops (-500 jjL) of 0.5% guaiacol was added and the sample was left to 
stand for 10 min before adding one drop (-50 fxL) of 0.1% H 2 0 2 . An immediate 
change in colour of the solution, from clear to red, indicates a positive result and high 
seed coat peroxidase activity. 

10 Example 1: The Seed Coat Peroxidase cDNA and genomic DNA sequences 

To isolate the seed coat peroxidase transcript, a cDNA library was constructed 
from developing seed coat tissue of the EpEp cultivar Harosoy 63. The primary 
library contained 10 6 recombinant plaque forming units and was amplified prior to 

15 screening. A degenerate 17-mer oligonucleotide corresponding to the conserved active 
site domain of plant peroxidases was used to probe the library. In screening 10,000 
plaque forming units, 12 positive clones were identified. The cDNA insert size of the 
clones ranged from 0.5 to 2.5 kb, but six clones shared a common insert size of 1.3 
kb. These six clones (soyprx03, soyprxOS, soyprx06, soyprxll, soyprxH, and 

20 soyprxl4) were chosen for further characterization since the 1.3 kb insert size matched 
the expected peroxidase transcript size. Sequence analysis of the six clones showed that 
they contained identical cDNA transcripts encoding a peroxidase and that each resulted 
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from an independent cloning event since the junction between the cloning vector and 
the transcript was different in all cases. 

Since it was not clear that the entire 5* end of the cDNA transcript was 
complete in any of the cDNA clones isolated, the structural gene corresponding to the 
seed coat peroxidase was isolated from a Harosoy 63 genomic library . A partial BamEl 
digest of genomic DNA was used to construct the library and more than 10 6 plaque 
forming units were screened using the cDNA probe. A positive clone, G25-2-1-1-1, 
containing a 17 kb insert was identified and a 4.7 kb region encoding the peroxidase 
was sequenced SEQ ID NO:2. This region includes 1532 nucleotides of the 5' region 
of the peroxidase gene. 

The genomic sequence matched the cDNA sequence except for three introns 
encoded within the gene. The genomic sequence also revealed two additional 
translation start codons, beginning one bp and 10 bp upstream from the 5' end of the 
longest cDNA transcript isolated. Figure 1 shows the deduced cDNA sequence. The 
open reading frame of 1056 bp encodes a 352 amino acid protein of 38,106 Da. A 
heme-binding domain, a peroxidase active site signature sequence, and seven potential 
N-glycosylation sites were identified from the deduced amino acid sequence. The first 
26 amino acid residues conform to a membrane spanning domain. Cleavage of this 
putative signal sequence releases a mature protein of 326 residues with a mass of 
35,377 Da and an estimated pi of 4.4. 
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Relevant features of the genomic fragment (Figure 2) include four exons at bp 
192-411 (exon 1; 1533-1751 of SEQ ID NO:2), 1042 -1233 (exon 2; 2383-2574 of 
SEQ ID NO:2), 2263-2429 (exon 3; 4033-4516 fo SEQ ID NO:2) and 2692-3174 
(exon 4; 1752-2382 of SEQ ID NO:2) and three introns at bp 412-1041 (intron 1; 
1752-2382 of SEQ ID NO:2), 1234-2263 (intron 2; 2575-3604 of SEQ ID NO:2) and 
5 2430-2691 (intron 3; 3770-4032 of SEQ ID NO:2). The 1532 bp regulatory region of 
the genomic DNA include a TATA box centred on bp 1487 and a cap signal 32 bp 
down stream centred at bp 1520 of SEQ ID NO:2. Also noted within the genomic 
sequence are three polyadenylation signals centred on bp 4520, 4598, 4700 and a 
polyadenylation site at bp 4700 of SEQ ID NO:2. 

10 

Figure 3 illustrates the relationship between the soybean seed coat peroxidase 
and other selected plant peroxidases. The soybean sequence is most closely related to 
four peroxidase cDNAs isolated from alfalfa, (see Figure 3) sharing from 65 to 67% 
identity at the amino acid level with the alfalfa proteins (X90693, X90694, X90692, 

15 el-Turk et al 1996; L36156, Abrahams et al 1994). When compared with other plant 
peroxidases, soybean seed coat peroxidase exhibits from 60 to 65% identity with 
poplar (D30653 and D30652, Osakabe et al 1994)) and flax (L0554, Omann and Tyson 
1995); 50 to 60% identity with horseradish (M37156, Fujiyama et al. 1988), tobacco 
(D11396, Osakabe et al 1993), and cucumber (M91373, Rasmussenet al. 1992); and 

20 49% identity with barley (L36093, Scott-Craig et al. 1994), wheat (X85228, Baga et 
al 1995) and tobacco (L02124, Diaz-De-Leon et al 1993) peroxidases. 



CA 02211018 1997-09-19 



-32- 

A comparison of the promoter region, 1-1532 of SEQ ID NO:2, indicates that 
there are no similar sequences present within the GENBANK database. 

Example 2: DNA Blot Analysis Using the Seed Coat Peroxidase cDNA Probe 

Reveals Restriction Fragment Length Polymorphisms Between EpEp and epep 
5 Genotypes 

Genomic DNA blots of 0X347 (EpEp) and 0X312 (epep) plants were 
hybridized with 32 P-labelled cDNA to estimate the copy number of the seed coat 
peroxidase gene and to determine if this locus is polymorphic between the two 

1 0 genotypes. Figure 4 shows the hybridization patterns after digestion with BamHI, Xbal, 
and Sad. Restriction fragment length polymorphisms are clearly visible in the BamHI 
and Sacl digestions. The BamHI digestion produced a strongly hybridizing 17 kb 
fragment and a faint 3.4 kb fragment in the EpEp genotype. The 3.4 kb BamHI 
fragment is visible in the epep genotype but the 17 kb fragment has been replaced by 

1 5 a signal at > 20 kb. The Sacl digestion resulted in detection of three fragments in EpEp 
and epep plants. At least two fragments were expected here since the cDNA sequence 
has a Sacl site within the open reading frame. However, the smallest and most strongly 
hybridizing of these fragments is 5.2 kb in EpEp plants and 4.9 kb in epep plants. 
Digestion with Xbd produced hybridizing fragments of -14 kb and 7.8 kb for both 

20 genotypes, with the larger fragment showing a stronger signal. 
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Example 3: A Deletion Mutation Occurs in the Recessive ep Locus 

The structural gene encoding the seed coat peroxidase is schematically 
illustrated in Figure 5. The 17 kb BamHL fragment encompassing the gene includes 191 
bp of sequence upstream from the translation start codon, three introns of 631 bp, 1030 
5 bp, and 263 bp, and 13 kb of sequence downstream from the poiyadenylation site. The 
arrangement of four exons and three introns and the placement of introns within the 
sequence is similar to that described for other plant peroxidases (Simon, 1992; Osakabe 
etal. 1995). 

10 Primers were designed from the DNA sequence to compare EpEp and epep 

genotypes by PCR analysis. Figure 6 shows PCR amplification products from four 
different primer combinations using 0X312 (epep) and 0X347 (EpEp) genomic DNA 
as template. The primer annealing site for prx29+ begins 182 bp upstream from the 
ATG start codon; the remaining primer sites are shown in Figure 1. Amplification with 

15 primers prx2+ and prx6-, and with prxl24- and prxlO- produced the expected 
products of 1.9 kb and 860 bp, respectively, regardless of the Eplep genotype of the 
template DNA. However, PCR amplification with primers prx9 + and prxlO-, and with 
prx29+ and prxlO- generated the expected products only when template DNA was 
from plants carrying the dominant Ep allele. When template DNA was from an epep 

20 genotype, no product was detected using primers prx9+ and prxlO- and a smaller 
product was amplified with primers prx29+ and prxlO-. The products resulting from 
amplification of 0X312 or 0X347 template DNA with primers prx29+ and prxlO- 
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were directly sequenced and compared. The polymorphism is due to an 87 bp deletion 
occurring within this DNA fragment in 0X312 plants, as shown in Figure 5. This 
deletion begins nine bp upstream from the translation start codon and includes 78 bp 
of sequence at the 5 1 end of the open reading frame, including the prx9+ primer 
annealing site. 

5 

To test whether this deletion mutation cosegregates with the seed coat 
peroxidase phenotype, genomic DNA from an F2 population segregating at the Ep locus 
was amplified using primers prx9+ and prxlO- and F 3 seed was tested for seed coat 
peroxidase activity. Figure 7 shows the results from this analysis. Of the 30 F 2 
10 individuals tested, all 23 that were high in seed coat peroxidase activity produced the 
expected 860 bp PCR amplification product. The remaining seven F 2 's with low seed 
coat peroxidase activity produced no detectable PCR amplification products. 

Finally, to determine if the OX3l2(epep) and OX347(EpEp) breeding lines are 
15 representative of soybean cultivars that differ in seed coat peroxidase activity, several 
cultivars were tested by PCR analysis using primer combinations targeted to the Ep 
locus. Figure 8 shows results from this analysis of six different soybean cultivars, three 
each of the homozygous dominant EpEp and recessive epep genotypes. As observed 
with 0X312 and 0X347, amplification products of the expected size were produced 
20 with primers prxl2+ and prxlO- regardless of the genotype, whereas epep genotypes 
yielded no product with primers prx9 4- and prxlO- or a smaller fragment with primers 
prx29+ and prxlO-. 
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Example 4 Developmental Pattern of Expression of the Ep gene 

The seed coat peroxidase mRNA levels were determined by hybridizing RNA gel blots 
with radio labelled cDNA probe. The figure illustrates the transcript abundance in 
5 various tissues of epep and EpEp plants. The mRNA accumulated to high levels in seed 
coat tissues of EpEp plants, especially in the later stages development when whole seed 
fresh weight exceeded 50 mg. Low levels of transcript could also be detected in root 
tissues but not in the flower, embryo, pod or leaf. The transcript could also be detected 
in seed coat and root tissues epep plants but in drastically reduced amounts compared 

10 to the EpEp genotype. The reduced amounts of peroxidase mRNA present in seed coats 
of epep plants indicates that the transcriptional process and/or the stability of the 
resulting mRNA is severely affected. The Ep gene has a TATA box and a 5 ' cap signal 
beginning 47 bp and 15 bp, respectively, upstream from the translation start codon. 
The 87 bp deletion in the ep allele extends into the 5' cap signal and therefore could 

15 interfere with transcript processing. Regardless, any resulting transcript will not be 
properly translated since the AUG initiation codon and the entire aroino-terminal 
signal sequence is deleted from the ep allele. Not wishing to be bound by theory, the 
lack of peroxidase accumulation in t seed coats of epep plants appears to be due to at 
least two factors, greatly reduced transcript levels and ineffective translation, resulting 

20 from mutation of the structural gene encoding the enzyme. In summary, the results 
indicate that the Ep gene regulatory elements can drive high level expression in a 
tightly coordinated, tissue and developmentally specific manner. 
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AU scientific publications and patent documents are incorporated herein by 
reference. 

The present invention has been described with regard to preferred 
embodiments. However, it will be obvious to persons skilled in the art that a number 
of variations and modifications can be made without departing from the scope of the 
invention as described in the following claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 
5 (A) NAMB: Mark Gijzen 

(B) STREET: 848 Princess Avenue 

(C) CITY: London 

(D) STATE: Ontario 

(E) COUNTRY: Canada 

10 (F) POSTAL CODE (ZIP) : NSW 3M4 

(ii) TITLE OF INVENTION: Seed Coat DNA Regulatory Region and 
Peroxidase 

15 (iii) NUMBER OF SEQUENCES: 2 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

20 (C) OPERATING SYSTEM: PC- DOS /MS -DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.30 (EPO) 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1244 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



25 



30 
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(ii) MOLECULE TYPE: cDNA 



(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!. .1056 



(ix) FEATURE : 

(A) NAME/KEY: sig_peptide 
<B) LOCATION: 1 . . 77 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

ATG GGT TCC ATG CGT CTA TTA GTA GTG GCA TTG TTG TGT GCA TTT GCT 
Met Gly Ser Met Arg Leu Leu Val Val Ala Leu Leu Cys Ala Phe Ala 
15 10 15 

ATG CAT GCA GGT TTT TCA GTC TCT TAT GCT CAG CTT ACT CCT ACG TTC 
Met His Ala Gly Phe Ser Val Ser Tyr Ala Gin Leu Thr Pro Thr Phe 
20 25 30 

TAC AGA GAA ACA TGT CCA AAT CTG TTC CCT ATT GTG TTT GGA GTA ATC 
Tyr Arg Glu Thr Cys Pro Asn Leu Phe Pro He Val Phe Gly Val He 
35 40 45 



TTC GAT GCT TCT TTC ACC GAT CCC CGA ATC GGG GCC AGT CTC ATG AGG 
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Phe Asp Ala Ser Phe Thr Asp Pro Arg lie Gly Ala Ser Leu Met Arg 
50 55 60 

CTT CAT TTT CAT GAT TGC TTT GTT CAA GGT TGT GAT GGA TCA GTT TTG 240 
Leu His Phe His Asp Cys Phe Val Gin Gly Cys Asp Gly Ser Val Leu 
65 70 75 80 

CTG AAC AAC ACT GAT ACA ATA GAA AGC GAG CAA GAT GCA CTT CCA AAT 288 
Leu Asn Asn Thr Asp Thr He Glu Ser Glu Gin Asp Ala Leu Pro Asn 
85 90 95 

ATC AAC TCA ATA AGA GGA TTG GAC GTT GTC AAT GAC ATC AAG ACA GCG 336 
He Asn Ser He Arg Gly Leu Asp Val Val Asn Asp He Lys Thr Ala 
100 105 no 



15 GTG GAA AAT AGT TGT CCA GAC ACA GTT TCT TGT GCT GAT ATT CTT GCT 
Val Glu Asn Ser Cys Pro Asp Thr Val Ser Cys Ala Asp He Leu Ala 
115 120 125 



384 



ATT GCA GCT GAA ATA GCT TCT GTT CTG GGA GGA GGT CCA GGA TGG CCA 432 
He Ala Ala Glu He Ala Ser Val Leu Gly Gly Gly Pro Gly Trp Pro 
130 135 140 

GTT CCA TTA GGA AGA AGG GAC AGC TTA ACA GCA AAC CGA ACC CTT GCA 480 
Val Pro Leu Gly Arg Arg Asp Ser Leu Thr Ala Asn Arg Thr Leu Ala 
145 ISO 155 160 

AAT CAA AAC CTT CCA GCA CCT TTC TTC AAC CTC ACT CAA CTT AAA GCT 528 
Asn Gin Asn Leu Pro Ala Pro Phe Phe Asn Leu Thr Gin Leu Lys Ala 
165 170 175 



30 
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TCC TTT GCT GTT CAA GGT CTC AAC ACC CTT GAT TTA GTT ACA CTC TCA 
Ser Phe Ala Val Gin Gly Leu Asn Thr Leu Asp Leu Val Thr Leu Ser 
180 - 185 190 



576 



GGT GGT CAT ACG TTT GGA AGA GCT CGG TGC AGT ACA TTC ATA AAC CGA 
Gly Gly His Thr Phe Gly Arg Ala Arg Cys Ser Thr Phe He Asn Arg 
195 200 205 



624 



10 



TTA TAC AAC TTC AGC AAC ACT GGA AAC CCT GAT CCA ACT CTG AAC ACA 
Leu Tyr Asn Phe Ser Asn Thr Gly Asn Pro Asp Pro Thr Leu Asn Thr 
210 215 220 



672 



15 



ACA TAC TTA GAA GTA TTG CGT GCA AGA TGC CCC CAG AAT GCA ACT GGG 
Thr Tyr Leu Glu Val Leu Arg Ala Arg Cys Pro Gin Asn Ala Thr Gly 
225 230 235 240 

GAT AAC CTC ACC AAT TTG GAC CTG AGC ACA CCT GAT CAA TTT GAC AAC 
Asp Asn Leu Thr Asn Leu Asp Leu Ser Thr Pro Asp Gin Phe Asp Asn 
245 250 255 



720 



768 



20 



AGA TAC TAC TCC AAT CTT CTG CAG CTC AAT GGC TTA CTT CAG AGT GAC 
Arg Tyr Tyr Ser Asn Leu Leu Gin Leu Asn Gly Leu Leu Gin Ser Asp 
260 265 270 



816 



25 



CAA GAA CTT TTC TCC ACT CCT GGT GCT GAT ACC ATT CCC ATT GTC AAT 
Gin Glu Leu Phe Ser Thr Pro Gly Ala Asp Thr He Pro He Val Asn 
275 280 285 



664 



30 



AGC TTC AGC AGT AAC CAG AAT ACT TTC TTT TCC AAC TTT AGA GTT TCA 
Ser Phe Ser Ser Asn Gin Asn Thr Phe Phe Ser Asn Phe Arg Val Ser 
290 295 300 



912 



'4 
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ATG ATA AAA ATG GGT AAT ATT GGA GTG CTG ACT GGG GAT GAA GGA GAA 960 
Met lie Lys Met Gly Asn He Gly Val Leu Thr Gly Asp Glu Gly Glu 
305 310 315 320 

ATT CGC TTG CAA TGT AAT TTT GTG AAT GGA GAC TCG TTT GGA TTA GCT 1008 
5 He Arg Leu Gin Cys Asn Phe Val Asn Gly Asp Ser Phe Gly Leu Ala 
325 330 335 

AGT GTG GCG TCC AAA GAT GCT AAA CAA AAG CTT GTT GCT CAA TCT AAA 1056 
Ser Val Ala Ser Lys Asp Ala Lys Gin Lys Leu Val Ala Gin Ser Lys 
10 340 345 350 

TAAACCAATA ATTAATGGGG ATGTGCATGC TAG CT AG CAT GTAAAGGCAA ATTAGGTTGT 1116 



15 



AAACCTCTTT GCTAGCTATA TTGAAATAAA CCAAAGGAGT AGTGTGCATG TCAATTCGAT 1176 



TTTGCCATGT ACCTCTTGGA ATATTATGTA ATAATTATTT GAATCTCTTT AAGGTACTTA 1236 



ATTAATCA 1244 



20 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 4700 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 
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(ix) FEATURE: 

(A) NAME/KEY: promoter 

(B) LOCATION: 1. .1532 

5 (ix) FEATURE : 

(A) NAME/KEY: sigjpeptide 

(B) LOCATION: 1533. .1609 

(ix) FEATURE: 
10 (A) NAME/KEY: exon 

(B) LOCATION: 1533. .1751 

(ix) FEATURE: 

(A) NAME /KEY : exon 
15 (B) LOCATION: 2383. .2574 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 3605. .3769 

20 

(ix) FEATURE: 

(A) NAME /KEY: exon 

(B) LOCATION: 4033. .4516 

25 (ix) FEATURE: 

-(A) NAME/KEY: intron 
(B> LOCATION: 1752. .1782 

(ix) FEATURE: 



30 



(A) NAME/KEY: intron 



CA 02211018 1997-09-19 



-47- 

(B) LOCATION: 2575. .3604 



(ix) FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 3770. .4032 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1533 . .1751 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 23 83. .2574 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 3605. .3769 



(ix) FEATURE: 

(A) NAME/ KEY : CDS 

(B) LOCATION:4033. .4516 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



TAGATAAAAA AATGGGATAT AATTTTTCTC AGATGTTGTT TATACTGTTT TTTTAATCAG 



AATTAAAATT CCTCTTTAAT TATCGACATA ATTTTTTTTG GTGAATATTA TCGACATAAT 



TATTTAATAC AAATTTTTAT TGTACATAGA AGTGATACTT CAATTTTAAT ATTGGAGAAC 
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AGTACGAAAA CATAAAAAAA CTGTTATTAG AAGAAAAAAA TATATGGAAA AGGTTAGCTA 240 

CATATATTAG CTAAATTAGT TGTTCTAATT GGCTATATAA ACCCTATTGT ACTCTTTGTA 300 

ATCTCACCTT TTTCATTTAA ATACATTTCT ACTTTTTAAG TTCTATATTT TCTCTCAATT 3 g 0 

5 

TTCTTCGATA AACCATGAAA TTTAACATGG TATATCAGCG ATACCACCCA CTTTGAAAGC 420 

CATGTATGGC TAGTATGGGC AGCCAAAATT TGCCCTGGTT CAAGCAAAGC AAGTGTTTAT 480 

10 ATAGATGTGA CTTTTGTTGA GGAACTCATG CCAATGGTAC TGATTGTQAA ACTGAGAAAA S40 

CTAATTTGGA GAATTTGAAT TATGATCATT AAATACTCCT CTCCTGACTA CCTTCGTCCC 600 

TCAAATTTGT ACCATCATTA TTTCCCAAAA ATTTGATTAC AATGCACTAA TTAATGAATG 660 

15 

TTTCTTACAT TATCATATTA TCATATCTGA CATTTTGTTT TTACTTTTTA TAATAATTAT 720 

TTTAAAAAGT CATACATGCA AATAATTTTT TAATAGTTTA CAGTTAAATT TTTACAGTAA 780 

20 AAATGCATGA AAATTAAACT TTATTTTTCC AAGTCATCAT TTAGTCAAAT CCCAAAACAA 840 

TGATTATTTT TTGCAAATGA ATGTTTATTG AACATTTAAA TGTAGCCTAA TTAATTCTGG 900 

TTATGGTGTC AATGTTCCAA AACCTAATGC AAGATCTTAG CAAGTACATA CATAGATCTA 960 

25 

ATTTTAAACT TATCTTTACG CAAGAGATAT AAAGATTATA CATCTAGTTT TAAACATTAA 1020 

CTTTTGTTTT TGTGTTAAAA AACAGTAACA TTTTCTTAAT TTTGTAGAGT GACGTGCTCC 1080 

30 AACCATATTA ACGAAGATTT TAATTGGTAT TCAAGTTCAT GAACTTAGTA AATAAGTTTT 1140 
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GGTCTTCAGT TTTCAATTTT CATTACAACA TTTATGTAAA ATATCAACGT TTTCTGAAAT 1200 

TTGTTGCTTG TGTGCTCCAA CCACATTTAA GAGATTATAG AAATTAATTT TCAAGAAGAT 1260 

AATGATTCCT ACTCTTGC7G GCCCTACCAT AGTACAATAA ATCCACTCAT AAATCAACAA 1320 

GTCGTCGTCA TAGGCAATTG GGCATCATAT CATAAACAAT ACGTACGTGA TATTATCTAG 1380 

TGTCTCTCAG TTTACTTTAT GAGAAATTAT TTTTCTTTAA AAAAAGTTAA TTAATAAAAA 1440 

10 CATTTGCGAT ACCGTGAGTT ACAAGAAATC CGCCGAATTC ATCTCTATAA ATAAAAGGAT 1500 



5 



1553 



1601 



1649 



CTATATGAGA GGTAAAATCA TATTAACTCA AA ATG GGT TCC ATG CGT CTA TTA 

Met Gly Ser Met Arg Leu Leu 
355 

15 

GTA GTG GCA TTG TTG TGT GCA TTT GCT ATG CAT GCA GGT TTT TCA GTC 
Val Val Ala Leu Leu Cys Ala Phe Ala Met His Ala Gly Phe Ser Val 
360 365 370 3 75 

20 TCT TAT GCT CAG CTT ACT CCT ACG TTC TAC AGA GAA ACA TGT CCA AAT 
Ser Tyr Ala Gin Leu Thr Pro Thr Phe Tyr Arg Glu Thr Cys Pro Asn 
38 0 385 390 

CTG TTC CCT ATT GTG TTT GGA GTA ATC TTC GAT GCT TCT TTC ACC GAT 
25 Leu Phe Pro lie Val Phe Gly Val He Phe Asp Ala Ser Phe Thr Asp 
395 * 400 405 



CCC CGA ATC GGG GCC AGT CTC ATG AGG CTT CAT TTT CAT GAT TGC TTT 1745 
Pro Arg He Gly Ala Ser Leu Met Arg Leu His Phe His Asp Cys Phe 
30 415 420 



1697 
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GTT CAA GTACGTACTT TTTTTTTTCC TTCCAAAATG CCCTGCATAT TTAACAAGAT 
Val Gin 
425 

TGCTTTGTTC ACCTAGAAAA ATGTGTTTTT TTCAACGATC TTACGTACGT TTGTTTGGTT 

TGAAAAATAA ATCAGAAAGA GATCAAGAAA ATAGCTAGAA AGAAAGCAAC GTTTTTTTAA 

AAGGTATTTA GTGTGAGAAA AATATTAAAA CTGAAGAGAA AGAAATTAAA TAAGCTTTTC 

TTGAATGATA TTTACATGTC TTATTAACTT AAAGTCACCT TTTTTCTTTA AGTTGTGCTT 

GAAGAAAAAA GATGTCTTTC AGTTTAGTTT TGATTAATGC TAATTATATT TTTAATTAAT 

TAATTAATAC TATATATCTA TTTACCATAT TAATTATTAC TATATTTCAT GATGACAACA 

GACAAGTATT CTAAAGAGGT ATCGGTAGAT GATTAATTTT TTTATAAAAA AATCTTTTGC 

GTGTATAGAT ATTCTTTTAT AATTGGTGCA GAAACTTGTA ATGCTAATTG CAATTAATCT 

TACATTGATT AACTAATAGC TATAATCAAT ATTTAGGTTA GGTATAGGAG ACAAATCAAG 

TGATCTGAAC AAATTAAGTT GTTATATTTG CATTGTGACA G GGT TGT GAT GGA 

Gly Cys Asp Gly 
1 

TCA GTT TTG CTG AAC AAC ACT GAT ACA ATA GAA AGC GAG CAA GAT GCA 

Ser Val Leu Leu Asn Asn Thr Asp Thr He Glu Ser Glu Gin Asp Ala 
5 10 15 20 

CTT CCA AAT ATC AAC TCA ATA AGA GGA TTG GAC GTT GTC AAT GAC ATC 
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Leu Pro Asn lie Asn Ser He Arg Gly Leu Asp Val Val Asn Asp He 
25 30 35 

AAG ACA GCG GTG GAA AAT AGT TGT CCA GAG ACA GTT TCT TGT GCT GAT 2538 
Lys Thr Ala Val Glu Asn Ser Cys Pro Asp Thr Val Ser Cys Ala Asp 
5 40 45 50 

ATT CTT GCT ATT GCA GCT GAA ATA GCT TCT GTT CTG GTAATTAATA 
He Leu Ala He Ala Ala Glu He Ala Ser Val Leu 
55 60 

10 

ACTCCTAATT AATTCCCAAC CATTAAAAAG TTGCATGATT GGATTCAAAA TTCTATGGTA 
TTGGGGTTCT GATATAAATT TGTAATTAAA TTGCACTAAA AAAAATTATC ATATACTTTT 2704 
15 AATAAAAAAA ATTTATCTAA TTTAATTTAT TATTAAAACT ATTTTTAAAA TTCAATCCTA 2764 
ACTCTTTTTT AATCGGAGCA TGTAAGCTGG CACCCACCGT ATATCGTTGG AAGATGCTAT 2824 
AAAACCATTT AATTAATGGA TGGAATCAGT CAAAACATTT AATTCAAAAT ACTCTTAATT 2884 

20 

GTGATTAGTA ATCATGTTCG GGCAAGTTAC GTTGTGTATA ATTAATTTGA CTTAATCAGA 2944 
TAAAAAAACA AATGGACGCA AGCCGGTTGG TATAGATATC ACTGGCCTGT AGAATATGTG 3004 
25 GTTTTTCACG TTTAAATAAA AGCTAGCTAC TATATTATAT TTAGTCTTTT TTTTTCTTAA 3064 
ACCCATTTAA CGTGATTTAT TGACTGTGAA ACATGTTTCC ACACACAGGC TTAGAAACTC 3124 
CTCGCAACTA ACATCTCCAA AATTTGACTA TTTATTTATG AAGATAATTC ATCTATGATG 3184 

30 



2584 



2644 
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TTCAACTCTA TTATATATAT GTATCATCGC AGTATTAAGA ATTATAATAG TCAAATATAG 3244 

AAGTATATCG GGTAAATGTA GTTGCATGTG CGACCTGTTT CGTGTAAAAT G CTT ATTCT A 3304 

TATAGCTTTT TTTATTGGAA AATAACGATG AACTAAAAAC GAAAGGGTAT CATATAGTTT 3364 

5 

GACTTTTATG TTAGAGAGAG ACATCTTAAT TTGGTCATAT GTTAAATAAT TAATTACAAT 3424 

GCATACACAA ATATTTATGC CATATCTAAA AAATGATAAA ATATCATAGG TATACTCAAC 3484 

10 TATATGATAT CCCCATAACA GAAATTGTAC TTTTCTTCAG GCAATGAACT TAACATTTCT 3544 

GTTTGCTAAA AACAAACATC CACTTAAAGT GGTTCAACAT ATTTATGTAA TAATTTAGAG 3604 

GGA GGA GGT CCA GGA TGG CCA GTT CCA TTA GGA AGA AGG GAC AGC TTA 3652 
15 Gly Gly Gly Pro Gly Trp Pro Val Pro Leu Gly Arg Arg Asp Ser Leu 
15 10 15 

ACA GCA AAC CGA ACC CTT GCA AAT CAA AAC CTT CCA GCA CCT TTC TTC 3700 
Thr Ala Asn Arg Thr Leu Ala Asn Gin Asn Leu Pro Ala Pro Phe Phe 
20 20 25 30 



25 



AAC CTC ACT CAA CTT AAA GCT TCC TTT GCT GTT CAA GGT CTC AAC ACC 3748 
Asn Leu Thr Gin Leu Lys Ala Ser Phe Ala Val Gin Gly Leu Asn Thr 
35 40 45 

CTT GAT TTA GTT ACA CTC TCA GGTATACATA ATCAATTTTT TATTTGCTAT 3799 
Leu Asp Leu Val Thr Leu Ser 
SO 55 



30 TAGCTAGCAA TAAAAAGTCT CTGATACAGA CATATTTAGA TAAATTAATT TCTCCATAAA 3859 



CA 02211018 1997-09-19 



5 



10 
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CATTTATAAT AAAATTATCA ATTTATGTAC TTAAAAATTA TGGATTGAAG CTCTTTTCAT 3919 

CCAACTTTTA CTAAAGTTAA GGTGCATATA ATATAAAATA AACTATCTCT TGTTTCTTAT 3979 

AAAAAGATTG AAGATAAGTT AAAGTCTACT TATAAATCAT TAATATATGT ATA GGT 4035 

Gly 

"l 

GGT CAT ACG TTT GGA AGA GCT CGG TGC ACT ACA TTC ATA AAC CGA TTA 4083 
Gly His Thr Phe Gly Arg Ala Arg Cys Ser Thr Phe lie Asn Arg Leu 
5 10 15 

TAC AAC TTC AGC AAC ACT GGA AAC CCT GAT CCA ACT CTG AAC ACA ACA 4131 
Tyr Asn Phe Ser Asn Thr Gly Asn Pro Asp Pro Thr Leu Asn Thr Thr 
20 25 30 

TAC TTA GAA GTA TTG CGT GCA AGA TGC CCC CAG AAT GCA ACT GGG GAT 4179 
Tyr Leu Glu Val Leu Arg Ala Arg Cys Pro Gin Asn Ala Thr Gly Asp 
35 40 45 



20 AAC CTC ACC AAT TTG GAC CTG AGC ACA CCT GAT CAA TTT GAC AAC AGA 
Asn Leu Thr Asn Leu Asp Leu Ser Thr Pro Asp Gin Phe Asp Asn Arg 
50 55 60 • 65 



4227 



TAC TAC TCC AAT CTT CTG CAG CTC AAT GGC TTA CTT CAG AGT GAC CAA 4275 
Tyr Tyr Ser Asn Leu Leu Gin Leu Asn Gly Leu Leu Gin Ser Asp Gin 
7 0 75 80 



GAA CTT TTC TCC ACT CCT GGT GCT GAT ACC ATT CCC ATT GTC AAT AGC 4323 
Glu Leu Phe Ser Thr Pro Gly Ala Asp Thr lie Pro lie Val Asn Ser 
30 85 90 95 
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TTC AGC AGT AAC CAG AAT ACT TTC TTT TCC AAC TTT AGA GTT TCA ATG 4371 
Phe Ser Ser Asn Gin Asn Thr Phe Phe Ser Asn Phe Arg Val Ser Met 
l°0 105 no 

ATA AAA ATG GGT AAT ATT GGA GTG CTG ACT GGG GAT GAA GGA GAA ATT 4419 
5 He Lys Met Gly Asn He Gly Val Leu Thr Gly Asp Glu Gly Glu He 
115 120 125 

CGC TTG CAA TGT AAT TTT GTG AAT GGA GAC TCG TTT GGA TTA GCT AGT 4467 
Arg Leu Gin Cys Asn Phe Val Asn Gly Asp Ser Phe Gly Leu Ala Ser 
10 130 135 140 145 

GTG GCG TCC AAA GAT GCT AAA CAA AAG CTT GTT GCT CAA TCT AAA TAA 4515 
Val Ala Ser Lys Asp Ala Lys Gin Lys Leu Val Ala Gin Ser Lys * 
ISO 155 160 

15 

ACCAATAATT AATGGGGATG TGCATGCTAG CTAGCATGTA AAGGCAAATT AGGTTGTAAA 4575 
CCTCTTTGCT AGCTATATTG AAATAAACCA AAGGAGTAGT GTGCATGTCA ATTCGATTTT 4635 
20 GCCATGTACC TCTTGGAATA TTATGTAATA ATTATTTGAA TCTCTTTAAG GTACTTAATT 4695 



AATCA 



4700 
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THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE 
PROPERTY OF PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS: 

1. An isolated DNA molecule comprising the nucleotide sequence of SEQ ED 
NO:l. 

2. An isolated DNA molecule comprising at least 24 contiguous nucleotides 
selected from nucleotides 1-1532 of SEQ ID NO:2 

3. The isolated DNA molecule comprising a nucleotide sequence substantially 
homologous to nucleotides 1533-4700 of SEQ ID NO:2. 

t. The isolated DNA molecule of claim 3 comprising a nucleotide sequence 
substantially homologous to that of nucleotides 1-4700 of SEQ ID NO:2. 

5. The isolated DNA molecule of claim 3 comprising nucleotides 1533-4700 of 
SEQ ID NO:2. 

i. The isolated DNA molecule of claim 4 comprising the nucleotide sequence of 
SEQ ID NO:2. 

The isolated DNA molecule of claim 2 comprising a nucleotide sequence 
substantially homologous to that of 1-1532 of SEQ ID NO:2. 

The isolated DNA molecule of claim 7, comprising the nucleotide sequence of 
nucleotides 1-1532 of SEQ ID NO:2. 

An isolated DNA molecule of claim 3 comprising at least 32 contiguous 
nucleotides selected from nucleotides 412-1041 of SEQ ID NO:2. 
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10. An isolated DNA molecule of claim 9 comprising the nucleotide sequence of 
412-1041 ofSEQIDNO:2. 

11. An isolated DNA molecule of claim 3 comprising at least 23 contiguous 
nucleotides selected from nucleotides 1234-2263 of SEQ ID NO:2. 

12. An isolated DNA molecule of claim 11 comprising the nucleotide sequence of 
1234-2263 of SEQ ID NO:2. 

13. An isolated DNA molecule of claim 3 comprising at least 22 contiguous 
nucleotides selected from nucleotides 2430-2691 of SEQ ID NO:2. 

14. An isolated DNA molecule of claim 13 comprising the nucleotide sequence of 
2430-2691 of SEQ ID NO :2. 

15. A vector which comprises the DNA molecule of claim 1. 

16. A vector which comprises the DNA molecule of claim 2. 

17. A vector which comprises the DNA molecule of claim 3. 

18. The vector of claim 16 which comprises a heterologous gene of interest under 
control of the DNA molecule. 

19. A host cell capable of expressing the DNA molecule within the vector of claim 
15. 

20. A host cell capable of expressing the DNA molecule within the vector of claim 
16. 
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21.. A host cell capable of expressing the DNA molecule within the vector of claim 
17. 

22. A host cell capable of expressing the DNA molecule within the vector of claim 
18. 

23. A transgenic plant comprising the vector of claim 15. 

24. A transgenic plant comprising the vector of claim 16. 

25. A transgenic plant comprising the vector of claim 17. 

26. A transgenic plant comprising the vector of claim 18. 

27. A method for the production of soybean seed coat peroxidase in a host cell 
comprising: 

i) transforming the host cell with a vector comprising an isolated DNA 
molecule selected from the group consisting of SEQ ID NO:l, and SEQ ID 
NO:2; and 

ii) culturing the host cell under conditions to allow expression of the soybean 
seed coat peroxidase. 

28 . A process for producing a heterologous gene of interest comprising propagating 
a transformed plant with the vector of claim 16. 

29. The process of claim 28 wherein the heterologous gene of interest is produced 
within seed coat cells. 
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Figure 1 

ATGGGTTCCATGCGTCTATT 20 
M Q S M R h It 

prx9+ > 

AGTAGTGGCATTGTTGTGTGCATTTCCTATGCATGCAG^ 8 0 

VVAT.LCA FAMHAGFSVSYA Q 1 
signal sequence 

GCTTACTCCTACGTTCTACAGAGAAACATGTCCAAAT 140 

LTPTFYRETCPNLFPIVFGV 21 

prxl2+ > 

AATCTTCGATGCTTCITTCACCGATCCCCGAATCGGGGC 200 

IFDAS FTDPRI Q A S h M R L H — E 41 

active site 

I 

TCATGATTGCTTTGTTCAAG GTTGTGATGGATCAGTTTTGCTGAACAACACTGATACAAT 260 

H D C FVQ GC DGSV LLNNTDTI 61 

--prxlO- --- prx2+ > 

AGAAAGCGAGCAAGATGCACTTCCAAATATCAACTCAATAAGAGGATTGGACGTTGTCAA 320 

ES EQDALPNINS IRGLDVVN 81 

TGACATCAAGACAGCGGTGGAAAATAGTTGTCCAGACACAGTTTCTTGTGCTCATATTCT 38 0 

D I KTAVENSCPDTVSCAD I L 101 

II 

TGCTATTGCAGCTGAAATAGCTTCTGTTCTG GG AGG AGGTC CAGGATGGC CAGTT C CATT 440 

AIAAEIASVL GGGPGWPVPL 121 

AGGAAGAAGGGACAGCTTAACAGCAAACCGAACCCTTGCAAATCAAAACCTTCCAGCACC 500 

GRRDSLTANRTLANQNLPAP 141 

TTTCTTCAACCTCACTCAACTTAAAGCTTCCTTTGCTGTTCAA 560 

FFNLTQLKA SFAVQGLNTLD 161 

III 

TTTAGTTACACTCTCAG GTGGTCATACGTTTGGAAGAGCTCGGTGCAGTACATTCATAAA 620 

L V T T, S OflHTF GRARCSTFIN 181 
heme -binding domain 



CCGATTATACAACTTCAGCAACACTGGAAACCCT 680 

R LY NF S NTGNPD PTLNTTY L 201 

AGAAGTATTGCGTGCAAGATGCCCCCAGAATGCAACTGGGGATAACCTCACC^TTTGGA 740 

EVLRARC PQNATGDNLTNLD 221 

CCTGAGCACACCTGATCAATTTGACAACAGATACT 800 

LSTPDQFDNRYYSNLLQLNG 241 

CTTACTTCAGAGTGAC CAAGAACTTTTCTCCACTCCTGGTGCTGATACCATTC CCATTGT 860 
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LLQSDQELFSTPGADTIP IV 261 

< prx6- 

CAATAGCTTCAGCAGTAACCAGAATACTTTCTTTTCCAACTTT^ 920 

NSFSSNQNTFFSNFRVSMIK 281 

AATGGGTAATATTGGAGTGCTGACTGGGGATGAAGGAGAAATTCGCTTGCA^ 980 

MGNIGVLTGDEGEIRLQCNF 301 

TGTGAATGGAGACTCGTTTGGATTAGCTAGTGTGGCGTCCAAAGATGCTAAACAAAAGCT 1040 

VNGDS FGLASVASKDAKQKL 321 

TGTTGC TCAATCT AAATAAAC CAATAATTAATGGGGATGTGCATGCTAGCTAGCATGT AA 1100 

V A Q S K * 326 

AGGCAAATTAGGTTGTAAACCTCTTTGCTAG^ATATTGAAATAAACCAAAGGAGTAGTG 1160 

TGCATGTCAATTCGATTTTGCCATGTACCTCTTGGAATATTATGTAATAATTATTTGAAT 1220 

CTCTTTAAGGTACTTAATTAATC (A) n 
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Figure 2 

10 20 30 40 50 60 

I I I I I I 

1 GCATCATATCATAAACAATACGTACGTGM 
61 AGAAATTATTTTTCTTTAAAAAAAGTTAATTAATAA 
121 C^GAAATCCGCCGAATTCATCTCTATAAATAAAAGGATCTATATGAGAGGTAAAATCAT 
181 ATTAACTCAAAATGGGTTCCATGCGTCTATTAGTAGTGGCATTGTTGTGTGCA 
241 TGCATGCAGGTTTTTCAGTCTCTTATGCTCAGCTTACTCCTA 
301 GTCCAAATCTGTTCCCTATTGTGTTTGGAGTAATCTTCGATGCTTCT^ 
361 GAATCGGGGCCAGTCTCATGAGGCTTCATTTTCATGATTGCTTTC 
421 TTTTTTTTCCTTCCAAAATGCCCTGCATATTTAACAAGATTC 
481 ATGTGTTTTTTTCAACGATCTTACGTACGTTTGCT 

541 GATCAAGAAAATAGCTAGAAAGAAAGCAACGl u rrrrri'AAAAGGTATTTAGTGTGAGAAA 

601 AATATTAAAACTGAAGAGAAAGAAATTAAATAAGCTTTTCTTGAATGAT 

661 TTATTAACTTAAAGTCACCTTTTTTCTrTAAGTTGTC 

721 AGTTTAGTTTTGATTAATGCTAATTATATTTTTAATTAATTAATTAATAC 

781 TTTACCATATTAATTATTACTATATTTC^^ 

841 ATCGGTAGATGATTAATTTTTTTATAAAAAA 

901 AATTGGTGCAGAAACTTGTAATGCTAATTGCAATTAATCTTACATTGATTAA 
961 TATAATCAATATTTAGGTTAGGTATAGGAGACAAATCAAGTGATCTGAACAAATTAAGTT 
1021 GTTATATTTGCATTGTGACAGGGTTGTGATGGATCAGTTTTGCTC 

1081 ATAGAAAGCGAGCAAGATGCACTTCCAAATATCAACTCAATAAGAGGATTGGACGTTGTC 
1141 AATGACATCAAGACAGCGGTGGAAAATAGTTGTCCAGACACAGTTTCTTGTGCTGATATT 
1201 CTTGCTATTGCAGCTGAAATAGCTTCTGTTCTGGTAATTAATAACTCCTAATTAATTCCC 
1261 AACCATTAAAAAGTTGCATGATTGGATTCAAAATTCTATGGTATTGGGGTTCTGATATAA 
1321 ATTTGT AATTAAATTG CACT AAAAAAAA.TT AT CAT AT ACTTTT AAT AAAAAAAATTTAT C 
1381 TAATTTAATTTATTATTAAAACTATTTTTAAAATTCAATCCTAACTCTTTTTTAATCGGA 
1441 GCATGTAAGCTGGCACCCACCGTATATCGTTGGAAGATGCTATAAAACCATTTAATTAAT 
1501 GGATGGAATCAGTCAAAACATTTAATTCAAAATACTCTTAATTGTGATTAGTAATCATGT 
1561 TCGGGCAAGTTACGTTGTGTATAATTAATTTGACTTAATCAGATAAAAAAACAAATGGAC 
1621 GCAAGCCGGTTGGTATAGATATCACTGGCCTGTAGAATATGTGGTTTTTCACGTTTAAAT 
1681 AAAAGCTAGCTACTATATTATATTTAGTCTriTriTTTCTTAAACCCATTTAACGTGATT 
1741 TATTGACTGTGA^CATGTTTCCACACACAGGCTTAGAAACTCCTCGCAACTAAC^TCTC 
1801 CAAAATTTGACTATTTATTTATGAAGATAATTCATCTATGATGTTCAACTCTATTATATA 
1861 TATGTATCATCGCAGTATTAAGAATTATAATAGTCAAATATAGAAGTATATCGGGTAAAT 
1921 GTAGTTGCATGTGCGACCTGTTTCGTGTAAAATGCTTArrCTATATAGCTTTTTTTATTG 
1981 GAAAATAACGATGAACTAAAAACGAAAGGGTATCATATAGTTTGACTTTTATGTTAGAGA 
2041 GAGACATCTTAATTTGGTCATATGTTAAATAATTAATTACAATC 

2101 TGCCATATCTAAAAAATGATAAAATATCATAGGTATACTCAACTATATGATATCCCCATA 
2161 ACAGAAATTGTACTTTTCTTCAGGCAATGAACTTAACATTTCTGTT^ 
2221 ATC»CTTAAAGTGGTT(^CATATTTATGTAATAATTTACAGGGAGGAGGTCCAGGATG 
2281 GCCAGTTC CATTAGGAAGAAGGGACAGCTTAACAGCAAACCGAAC CCTTGCAAAT CAAAA 
2341 CCTTCCAGCACCTTTCTTCAACCTCACT^ 

2401 CAACACCCTTGATTTAGTTACACTCTCAGGTATACATAATCAATTTTCT 

2461 GCTAGCAATAAAAAGTCTCTGATACAGACATATTTAGATAAATTAATTTCTCCA 

2521 TTTATAATAAAATTATCAATITATGTACTTAAAAATTATGGATTGAAGCTCTTTTC^ 

2581 AACTTTTACTAAAGTTAAGGTGCATATAATATAAAATAAACTATCTCTTG1TTCTTAT 

2641 AAAGATTGAAGATAAGTTAAAGTCTACTTATAAAT(^TTAATATATGTATAGGTGGTCAT 

2701 ACGTTTGGAAGAGCTCGGTGCAGTACATTCATAAA£CGAT^ 

2761 GGAAACCCTGATCCAACTCTGAACACAACATAOT 

2821 CAGAATGCAACTGGGGATAACCTCACCAATTTGGAC 

2881 AACAGATACTACTCCAATCTTCTGCAGCTCAATGGCTTACTTCAGAGTC 

2941 TTCTCCACTCCTGGTGCTGATACCATTCCCATTGTCAATAG 

3001 ACTTTCTTTTCCAACTITAGAGTTTCAATGATAAAAATGGGTAA 
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3061 GGGGATGAAGGAGAAATTCGCTTGCA&TC 
3121 GCTAGTGTGGCGTCCA^GATGCTAAACAAAAGCTTGTTGCT 
3181 AATTAATGGGGATGTGCaTGCTAGCTAGCATGTAAAGGCAAATTA 
3241 TGCTAGCTATATTGAAATAAACCAAAGGAGTAGTGTGCATC 

3301 TACCTCTTGGAATATTATGTAATAATTATTTGAATCTCTTTAAGGTACTTAATTAAT 
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Figure 3A 

L78163 ATGGGTTC CATGCGT - CTATTAGT AGTGGCATTGTTG 36 

U41657 - 0 

X90693 G GCAAA- CAATGAACTCCCTTCXjTGCTGTAGCAATAG - CTTTGTGC 44 

X90694 GCTCTTCAAAACAATGAACTCC TTAGCAACTT - CTATGTGG 40 

L36156 CTCC TTAGCAACTT -CTATGTGG 22 

X90692 AATGCTTGGT -CTAAGTGCAACAGCTTTTTGCTGTATGG 38 

L78163 TGT GCATTT - GCTATGCATGCAGGTTTTTCAGT CTCTTATGC 7 7 

U41657 0 

X90693 TGTATTGTG GTTGTGCTTGGAGGGTTACCCTTCTCTTCAAATGC 88 

X 9 0 6 9 4 TGTGTTGTGCTTTTAGTTGTGCTTGGAGGACTACCCTTTTCCTCAGATGC 9 0 

L36156 TGTGTTGTCCTTTTAGTTGTGCTTGGAGGACTACCCTITTCCT 7 2 

X90692 TGT - TTGTGCTAAT TGGAGGAGTACCCTTTT CAAATGC - 75 

L78163 TCAGCTTACTCCTACGTTCTACAGAGAAACATGTCCAAATCnXSTTCCCTA 127 

U41657 - 0 

X906 93 GCAACTTGATCCATCCTTTTACAGGAACACTTGTCCAAATGTTAGTTCCA 138 

X90694 ACAACTTAGTCCC^CTTTTTACAGCAAAACGTGTCCAACTGTTAGTTCCA 140 

L3 6 1 5 6 ACAACTTAGTCCCACTTITTACAGCAAAACGTC^ 122 

X 9 0 6 9 2 ACAACTAGATCCTTCATTTTACAACAGTACATGTTCTAATCTTGATTCAA 125 

L 7 8 1 6 3 TTGTGTTTGGAGTAATCTTCGATGCTTCTTTCACCGATCCCCGAATCGGG 177 

U41657 * 0 

X 9 0 6 9 3 TTGTTCGTGAAGTCATAAGGAGTGTTTCTAAGAAAGATCCTCGTATGCTT 188 

X 9 0 6 9 4 TTGTTAGCAATGTCTTAACAAACGTTTCTAAGACAGATCCTCGCATGCTT 190 

L3 6 1 5 6 TTGTTAGCAATGTCTTAACAAACGTTTCTAAGACAGATCCTCGCATGCTT 172 

X 9 0 6 9 2 TCGTACGTGGTGTGCTCACAAATGTTTCACAATCTGATCCCAGAATGCTT 175 

L 7 8 1 6 3 GCCAGTCTCATGAGGCTTCATTTTCATGATTGCTTTGTTCAAGGTTGTGA 227 

U41657 TTTCATGATTGCTTTGTTCAAGGTTGTGA 2 9 

X 9 0 6 9 3 GCTAGTCTTGTCAGGCTTCACTTTCATGACTGTTTTGTTCAAGGTTGTGA 238 

X 9.0 6 9 4 GCTAGTCTCGTCAGGCTTCACTTTCATGACTGTTTTGTTCTGGGATGTGA 240 

L 3 6 1 5 6 GCTAGTCTCGTCAGGCTTCACTTTCATGACTGTTTTGTTCTGGGATGTGA 222 

X 9 0 6 9 2 GGTAGTCTCATCAGGCTACATTTTCATGACTGTTTTGTTCAAGGTTG 225 

******** ** *********** ** 

L78163 TGGATCAGTTTTGCTGAACAACACTGATACAATAGAAAGCGAGCAAGATG 277 

U41657 TGGATCAGTTTTACTGAACAACACTGATAC^TAGAAAGCGAGCAAGATG 79 

X90693 TGCATCAGTTTT ACTAAAC AAAACTGATAC CGTTGTGAGTGAACAAGATG 288 

X90694 TGCCTCAGTTTTGCTGAACAATACTGCTACAATCGTAAGCGA^ 290 

L36156 TGCCTCAGTTTTGCTGAACAATACTGCTACAATCGTAAGCGAACAACAAG 272 

X90692 TGCCTCGATTTTGCTGAACGATACGGCTACAATAGTGAGCGAGCAAAGTG 275 
** ** r m **** m ** b *** . * **,* **★ .* ★ ..** **.★** . * 

L78163 CACTTCCAAATATCAACTCAATAAGAGGATTGGACGTTGTCAATGA(^ 327 

U41657 CACTTCCAAATATCAACTCAATAAGAGGATTGGACGTTC 129 

X 9 0 6 9 3 CTTTTCCAAACAGAAACTCATTAAGAGGTTTGGATGTTC 338 

X90694 CTTTTCCAAATAACAACTC TCTAAGAGGTTTGGATGTTGTGAATCAGATC 340 

L3 6 1 5 6 CTTTTCCAAATAACAACTCTCTAAGGGGTTTGGATGTTGTGAAT 322 

X9 0 6 9 2 CAC CACCAAATAACAACTCCATAAGAGGTTTGGATGTGATAAAC CAGATC 325 
* # # ***** * ***** **** ^ ** m ***** ** >t * ** * *** 

L78163 AAGACAGCGGTGGAAAATAGTTGTCCAGACACAGTTTCTTGTGCT 377 
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041657 AAGACAGCGGTGGAAAATAGTTGTCCAGACACAGTTTCTTGTGCTGATAT 179 

X 9 0 6 9 3 AAAACAGCTGTGGAAAAGGCTTCTCCTAACACAGTTTCTTC 388 

X90694 AAACTGGCTGTAGAAGTGCCTTGTCCTAACACAGTTTCT^ 390 

L36156 AAAACTGCTGTAGAAAGTGCTTGTCCTAACACAGTTTCTT 372 

X90692 AAAACAGCGGTGGAAAATGCTTGTCCTAACACAGTTTCTTGTGCTGATAT 375 



i* . ***** ****** 



, ********************** 



L78163 
041657 
X90693 
X90694 
L36156 
X90692 



L78163 
U41657 
X90693 
X90694 
L36156 
X90692 



L78163 
U41657 
X90693 
X90694 
L36156 
X90692 



L78163 
041657 
X90693 
X90694 
L36156 
X90692 



L78163 
041657 
X90693 
X90694 
L36156 
X90692 



L78163 
041657 



TCTTGCTATTGCAGCTGAAATAGCTTCTGTT - CTGGGAGGAGGTCCAGGA 426 
TCTTGCTATTGCAGCTGAAATAGCTTCTGTTGCTGGGAGGAGGTC - AGGA 228 

TCTTGCTCTTTCTGCTGAATTATCATCTACA- CTGGCAGATGGTCCTGAC 437 

TCTTGCACTTGCTGCTCAAGCATCCTCTGTT - CTGGCACAAGGTCCTAGT 439 

TCTTGCACTTGCT - - - CAAGCATCCTCTGTT - CTGGCACAAGGTCCTAGT 418 

TCTTGCTCTTTCTGCTGAAATATCATCTGAT - CTGGCAAATGGTCCTACT 424 
******. **.*. *+. *.* ***. . **** * _**** 

TGGCCAGTTCCATTAGGAAGAAGGGACAGCTTAACAGCAAACCGAACCCT 4 76 

TGGCCAGTTCCATTAGGAAGAAGGGACAGCTTAACAGCAAACCGAACCCT 278 

TGGAAGGTTCCTTTAGGAAGAAGAGATGGTTTAACGGCAAACCAGTTACT 487 

TGGACGGTTCCTTTAGGAAGAAGGGATGGTTTAACCGCAAACCGAACACT 489 

TGGACGGTTCCTTTAGGAAGAAGGGATGGTTTAACCGCAAACCGAACACT 468 

TGGCAAGTTCCaTTAGGAAGAAGGGATAGTTTGACAGCAAATAATTCCCT 474 
*** .*****.************* + **** ***** 

TGCAAATCAAAACCTTCCAGCACCTTTCTTCAA- - CCTCA- CTCAACTTA 523 

TGCAAATCAAAACCTTCCAGCACCTTTCTTCAA- - CCTCA-CTCAACTTA 325 

TGCTAATCAAAATCTTCCAGCTCC- - -TTTCAATACTACTGATCAACTTA 534 

TGCAAATCAAAATCTTCCGGCTCC- - - ATTCAATTCCTTGGATCAACTTA 536 

TGCAAATCAAAATCTTCCGGCTCC- - - ATTCAATTCCTTGGATCACCTTA 515 

TGCAGCTCAAAATCTTCCTGCCCCCACTTTCAA--CCTTA-CTCGACTAA 52 1 
***.. ****** ******* ** ***** * ** ** * 

AAGCTTCCTTTG - CTGTTCAAGGTCTCAACACCCTTGATTTAGTTACACT 572 

AAGCTTCCTTTG - CTGTTCAAGGTCTCAACACCCTTGATTTAGTTACACT 374 

AAGCTGCATTTG- CTGCTCAAGGTCTCGATACTACTGATCTGGTTGCACT 583 

AAGCTGCATTT-ACTGCTCAAGGCCTCAATACTACTGATCTAGTTGCACT 585 

AA-CTGCATTTGACTGCTCAAGGCCTCATTACTCCTGTTCTAGTTGCCCT 564 

AATCTAACTTTGA- TAATCAAAACCTCAGTACTACTGATCTAGTTGCACT 570 
** **. *** ★.****_ ***.. ** *** ***** *.* 

CTCAGGTGGTCATACGTTTGGAAG&GCTCGGTGCAGTACAT^ 622 

CTCAGGTGGTCATACGTCTGGAAGAGCTCGGTGCAGTACATTCATAAACC 424 

CTCCGGTGCTCATACATTTGGAAGAGCTCATTGCTCTTTATTTGTTAGCC 633 

CTCGGGTGCTCATACATTTGGAAGAGCTCATTG 635 

CTCGGGTGCTCATACATTTCGAAGAGCTCATTGCGCACAA^ 6 14 

CTCAGGTGGCCATACAATTGGAAGAGGTCAATGCAGATTTTTCGTTGATC 620 
*** **** ***** ******** ***** * + + ^ 

GATTATAC^CTTCAGCAACACTGGAAACCCTGATCCAACTCTGAACACA 672 

GATTATACAACTTCAGCAACACTGGA CTGATCCA- CT- TGGACACA 468 
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X90693 
X90694 
L36156 
X90692 



L78163 
U41657 
X90693 
X90694 
L36156 
X90692 



L78163 
U41657 
X90693 
X90694 
L36156 
X90692 



GATTGTACAACTTCAGCGGTACGGGAAGTCCCGATCCAACTCTTAACACA 683 

O^TTGTAGAACTTCAGCAGTAC 6Q5 

GATTGTACAACTTCAGCAGTACTGGAAGTCCCGATCCAACTCTTAACACA 664 

GATTATACAATTTCAGCAACACTGGAAACCCCGATTCAACTCTTAACACG 670 
********* ****** ***** * *** ** * **** 

ACATACTTAGAAGTATTGCGTGCAA.GATGCCCCCAGAATGCAACT 722 

ACATACTTAGAAGTATTGCGTGCAAGATGCCCCCAGAATGCAACTGGGGA 518 

ACTTACTTACAACAATTGCGCACAATATGTCCCAATGGTGGACCTGGCAC 733 

ACTTACTTACAACAACTGCGCACAATATGTCCCAATGGTGGACCTGGCAC 735 

ACTTACTTACAACAACTGCGCACAATATGTCCCAATGGTGGAC CTGGCAC 714 

ACOTATTTACAAACATTGCAAGCAATATGTCCCAATGGTGGACCTGGTAC 720 
** ** *** ** * *** ****** *** * ** * **** 

TAACCTCACCAATTTGGACCTGAGCACACCTGATC^ 7 72 

TAACCTCACCAATTTGGACCTGAGCA£ACCTGA 56a 

GAACCTTACCAATTTCGATCCAACGACTCCTGATAAATTTGACAAGAACT 783 

AAACCTTACCAATTTCGATCCAACGACTCCTGATAAATTTGACAAGAACT 785 

AAACCTTACCAATTTCGATCCAACGACTCCTGATAAATTTGACAAGAACT 764 

AAACCTAACCGATTTGGACCCAACCACA.CCAGATACATTTGACT 770 
.****★ ******* ** * * ******* ******* * * 



L7 8 1 6 3 ACTACTCCAATCTTCTGCAGCTCAATGGCTTACTTCAGAGTGACCAAGAA 822 

U4 16 5 7 ACTACTCCAATCTTCTGCAGCTCAATGGCTTACTTCAGAGTGACCAAGAA 618 

X90693 ATTACTCTAATCTTCAAGTGAAAAAAGGTTTGCTTCAAAGTGATCAAGAG 833 

X90694 . ATTACTCCAATCTTCAAGTGAAAAAGGGTTTGCTCCAAAGTGATCAAGAG 83 5 

L36156 ATTACTCCAATCTTCAAGTGAAAAAGGGTTTGCTCCAAAGTGATCAAGAG 814 

X9 0 6 92 ACTACTCCAATCTCCAAGTTGGAAAGGGCTTGTTTCAGAGTGACCAAGAG 820 

. . *+.** **. * **.***** ***** ^ 



* ***** ***** * 



L7 8 163 CTTTTCTCCACTCCTGGTGCTGATACCATTCCCATTGTCAATAGCTTCAG 872 

U4 1 6 5 7 CGTTTCTCCACTCCTGGTGCTGATACCATTCC - ATTGTCAATAGCTTCAG 667 

X 9 0 6 9 3 TTGTTCTGAACATCTGGTTCAGATAC 883 

X90694 TTGTTCTCAACTTCTGGTGCAGATACCATTAGCATT^ 885 

L36156 TTGTTCTCAACTTCTGGTGCAGATACCATTAGCATTGTCG^ 864 

X90692 CTTTTTTCCAGAAATGGTTCTGACACTAT^ 870 
a .** ** * b ******* ** *** ******* * *** 

L78163 CAGTAACCAGAATACTTTCTTTTCCAACTTTAGAGTTTCAATGATAAAAA 922 

U41657 CG- -AACC^GAATACTTTCTTTTCCAACTTTAGAGTTTCAATGATAAAAA 715 

X 9 0 6 9 3 AACCGATCAAAAAGCTTTTTTTGAGAGCTTTAGGGCTGCTATGATCAAAA 933 

X 9 0 6 9 4 CACCGATCAAAATGCTTTCTTTGAGAGCTTTAAG 935 

L3 6 1 5 6 CACCGATCAAAATGCTTrcriTGAGAGCTTTAAGGCTGCAATGATTAA^ 914 

X90692 CAATAATCAAACTCTCTTCTTTGAAAATTTTGTAGCCTCAATGATAAAAA 920 
.***.* r ** *** * *** * * ***** **** 



L78163 TGGGTAATATTGGAGTGCTGACTGGGGATGAAGGAGAAATTCGCTTGCAA 972 

U41657 TGGGTAATATTGGAGTGCTGACTGGGGATGAAGGAGAAATTCGCTTC 765 

X90693 TGGGAAATATTGGTGTGTTAACCGGGAACCAAGGAGAGATTAGAAAACAA 983 

X90694 TGGGCAATATTGGTGTGCTAACAGGGACAAAAGGAGAGATTAGAAAAC^ 985 

L36156 TGGGCAATATTGGTGTGCTAACAGGGAC^AAAGGAGAGATTAGAAAAC^ 964 

X90692 TGGGTAATATTGGAGTTTTAACTGGATCTCAAGGTGAAATTAGAAC^ 970 
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**** ******** ** * ** ** 



**** ** *** * 



L 7 8 1 6 3 TGTAATTTTGTGAA TGGAGACTCGT TTGG ATT AG C 

U41657 TGTAATTTTGTGAA TGGAGACTCGT TTGGATTAGC 

X 9 0 6 9 3 TG CAACTTTGTT AATT CAAAATCAGCAGAACTTGG TCTTAT 

X 9 0 6 9 4 TGCAACTTTGTCAACTTTGTGAACTCA^ 

L 3 6 1 5 6 TGCAACTT TGTGAACTCAAATTCTGCAGAACTAGATTTAGC 

X90692 TG TAATGCTGTGAATGGGAATTCTTC TGGATTGGC 



1007 
800 
1024 
1035 
1005 
1005 



1*78163 TAGTGTGGCGTCCT^GATGCTAAACAAAAGCTTGTTGCTCAATCTAAAT 1057 

U41657 TAGTGTGGCGTCGA^GATGCTAAACAAAAGCTTGTTGCTCAATCTAAAT 850 

X90693 CAATGTTGCCTC AGCAG- - ATTCATCTG- AGGAGGGTATGGTTAG- - 1066 

X90694 CACCATAGCATCCATAGTAG- - AATCATTAG - AGGATGGTATTGCTAGTG 1082 

L36156 CACCATAGCATCCATAGTAG- - AATCATTAG -AGGATGGAATTGCTAGTG 1052 

X9 0 6 9 2 TACTGTAGTCACCAA AG- - AATCATCAG- AAGATGGAATGGCTAGCT 1049 

* .*.* .* .* *..**. . * ..*..* . ... **. 

L78163 AAACCAATAATTAATGGGGATGTGCATGCTAGCTAGCATGTAAAGGCAAA 1107 

U41657 AAACCAATAATTAATGGGGATGTCGATGCTAGCTACGATGTAAAGGCAAA 900 

X90693 CTCAATGTAAA-TG-TAG 1082 

X90694 TAATATAAATAAATTAG CGTAAATGCACTTATTGAA-ATCTTG 1124 

L36156 TAATATAAATAAATTAG CGAAAATGCACTTATTGAA - ATCTTG 1094 

X90692 CATTCTAAAT- -ATAAG CTTGGAAAATATTGAAGAGGTTCTAT 1090 



L7 8 16 3 TTAGGTTGTAAACCTCTTTGCTAGCTATATTGAAATAAACCAAAGGAGTA 1157 

U4 165 7 TTAGGTTG - AAACCTCTTTGCT AGCTATATTGAAATAAACCAAAGGAGTA 94 9 

X90693 T- -GATTGGAAGCAACTAA- -TAAATTAAGAAGCTATAAC T 1119 

X90694 T- - GACTAG AT GC CAC T AA - - TAAAT AAGTT AT AAC T 1157 

L36156 T- - GACTAGATCCCACTAA- -TAAAT AAGTTATAAC T 112 7 

X90692 A- - ATTTTGTGCATACATA- - TATGGTATGTG- 1H8 

* * ** 



L78163 GTGTGCATGTCAATTCGATTTTGC - CATGTACCTCTTGGAATAT 1200 

U4 1 6 5 7 GTGTCGATGTCAATTCGATTTTGC - CATGTACCTCTTGGAATATTATGTA 998 

X90693 . ATGCACATT-CATGGTATGTGTGAGATAGTTATTAGATGCTTTGTGAGCA 1168 

X90694 AGGCACATTTCATGTCACTTGAAATTTCATGCCT-GTATATGAG 1200 

L3 6 1 5 6 AGGCACATTTCATGTCACTTGAAATCCTATGCCTTGTATATTAGAGGACG 1177 

X90692 CATGTGGTGTA- - TTATGTTTTTGTTATGTTCTTCAAGTTGATCA 1161 



L78163 
U41657 
X90693 
X90694 
L36156 
X90692 



ATAATTATTTGAATCTC - AAAAAAAAAAAAAAAA 

AAAATCTTTTGGATTTC ATTTGAAGTGTTTCT 

TGT - TCTT --C TTGGTATTATACTA- -T 

GGGA- CTGTAGAAGCTCCCTAATAATATTTGTGTCAAAGT 



1200 
1031 
1200 
1200 
1200 
1200 
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Figure 3B 

L78163 MGSMRIiLWALLCAFAMHAGFSVSY- - - AQLTPTFYRETC PNLFP I VFGV 47 

U41657 " 0 

X90693 MNSLRAVAIALCCIV- -VVLGGLPFSSNAQLDPSFYRKTCPNVSSIVRBV 48 

X90694 " MNSL- - -ATSMWCVVUiVVLGGLPFSSDAQLSPTFYSKTCPTVSSIVSNV 47 

L36156 M WCVVLLVVLGGLPFSSDAQLSPTFYSKTCPTVSSIVSNV 40 

X90692 MLGLSATA FCCMVFVLIGGTOFS-NAQIJ3PSFYNSTCSNLDSIVRGV 46 



L78163 IFDASFTDPRIGASIJ4RLHFHDCFVQGCDGSVLLNNTDTIESEQDALPNI 97 

U41657 FHDCFVQGCDGSVLLNNTDTIESEQDALPNI 31 

X90693 IRSVSKKDPRMLASLVRLHFHDCFVQGCDASVLLNKTDTW 98 

X90694 LTNVSKTDPRMLASLVRLHFHDC 97 

L3 6 1 5 6 LTNVSKTDPRMLASLVRLHFHDCFVIjGCDASVI^^ 9 0 

X9 0 6 9 2 LTNVSQSDPRMLGSLIRLHFHDCFVQGCDAS ILLNDTATIVSEQSAPPNN 9 6 

****** ***.*.*★*.*.*. *** *, ** 

L7 8 16 3 NS IRGLDWND I KTAVENS CPDTVS CAD ILAI AAE I AS VLGGGPGWPVPL 147 

U4 16 5 7 NS IRGLDVVNDIKTAVENSCPDTVSCADILAIAAEIASVAGRRSGWPVPL 8 1 

X90693 NS LRGLD WNQ IKTAVEKACPNTVS CAD I LALS AELS STLADGPDWKVPL 148 

X90694 NSLRGLDVVNQIKIJWEVrc^ 147 

L36156 NSLRGIJ3VVNQIKTAVESACPNTVSCADIIJ^ 13 9 

X90692 NS IRGLDVINQIKTAVENAC PNTVS CAD ILALS AE I S SDLANGPTWQVPL 146 
** t *★***.*, ** *** .**.*****★***.. . .* . *** 

L7 8 1 6 3 GRRDSLTANRTLANQNLPAPFFNLTQLKAS FAVQGLNTLDLVTLSGGHTF 197 

U4 16 5 7 GRRDSLTANRTIJ^QNLPAPFFNLTQLKASFAVQGIiNTLDLWLS 131 

X90693 GRRDGLTANQ LLANQNL P AP FNTTDQL KAAF AAQGLDTTD L VAL S GAHT F 198 

X90694 GRRDGLTANRTIJUTQNLPAPFNSIJ^LKAAFTAQGIjNTTDLVALSGAHTF 197 

L3 6 15 6 GRRDGLTANRTLANQNLPAPFNSLDHLKLHLTAQGLI TP VLVALSGAHTF 189 

X9 06 92 GRRDSLTANNSIAAQNLPAPTFNLTRLKSNFDNQNLSTTDLVALSGGHTI 196 
****.****. **.****** . *.* * **.**★.** 

L78163 GRARCSTFINRLYNFSNTGNTDPTLNTTYLEVLRARCPQNATGDNL 247 

U41657 GRARCSTFINRLYNFSNTGLIH- -LDTTYLEVLRARCPQNATGDNLTNLD 179 

X9 0 69 3 GRAHCSLFVSRLYNFSGTGSPDPTLNTTYLQQIiRTICPNGGPGTNLTNFD 248 

X9 0694 GRAHCAQ FVS RL YNFS S TG S PD PTLNTTYLQQLRT I C PNGGPGTNLTNFD 247 

L36156 GRAHCAQFVSRLyNFSSTGSPDPTLNTTYLQQLRTICPNGGPGTNLTNFD 239 

X90692 GRGQCRFFVDRLYNFSNTGNPDSTLNTTYLQ^ 246 
**..* *_******.** . *.*★**. *.. **....*.***..* 

L78163 LSTPDQFDNRYYSNIiLQLNGLLQSDQELFSTPGADTIPIVNSFSSNQNTF 297 

U41657 LSTPDQFDNRYYSNLLQIiNGLIjQSDQERFSTPGADTIPIiS XA- SANQNTF 228 

X90693 PTTPDKFDKNYYSNLQVKKGIjLQSDQELFSTSGSDTIS IVNKFATDQKAF 298 

X9 0 6 94 PTTPDKFDKNYYSNLQVKKGIXQSDQELFSTSGADTIS IVNKFSTDQNAF 297 

L36156 PTTPDKFDKNYYSNLQVKKGLLQ SDQELFS TSGADT I S IVDKFS TDQNAF 289 

X90692 pTTPDTFDSNYYSNLQVGKGLFQSDQELFSRNGSDTISIVNSFANNQTLF 296 
*** **_***** **,***** ** *.***.. ...*. * 

L78163 FSNFRVSMI KMGNI GVLTGDEGE IRLQCNFVN GDS FGLAS VAS - K 341 

U41657 FSNFRVSMI KMGNI GVLTGDEGE IRLQCNFVN GDS FGLASVAS - K 272 

X90693 FES FRAAMIKMGNIGVLTGNQGEIRKQCNFVN- - - S KS AELGL INVAS -A 344 

X90694 FESFKAAMIKMGNIGVLTGTKGEIRKQCNFVNFVNSNSAELDIATIASIV 347 

L3 6 1 5 6 FESFKAAMI KMGNI GVLTGTKGE IRKQCNFVN- - - SNS AELDLAT IAS IV 336 
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X90692 FENFVASMIKMGNIGVLTGSQGE IRTQCNAVN GNSSGLATWT-K 340 

*,.★ +*********♦★.. **** *** ** ... .*.... 
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Figure 4 
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Figure 5 
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Figure S 
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Figure 7 
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ABSTRACT 



T-DNA tagging with a promoterless ^-glucuronidase (GUS) gene 
generated a transgenic Nicotiana tabacum plant that expressed GUS 
activity only in developing seed coats. Cloning and deletion analysis of the 
GUS fusion revealed that the promoter responsible for seed coat 
specificity was located in the plant DNA proximal to the GUS gene. 
Analysis of the region demonstrated that the seed coat-specificity of GUS 
expression in this transgenic plant resulted from T-DNA insertion next to a 
cryptic promoter. This promoter is useful in controlling the expression of 
genes to the developing seed coat in plant seeds. 
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A SEED COAT-SPECIFIC CRYPTIC PROMOTER IN TOBACCO 

Field of Invention 

This invention relates to a cryptic promoter identified from 
Nkotiana tabacum (tobacco). Specifically this invention relates to a seed 
coat-specific cryptic promoter isolated from tobacco. 

Background and Prior Art 

Bacteria from the genus Agrvbacterium have the ability to transfer 
specific segments of DNA (T-DNA) to plant cells, where they stably 
integrate into the nuclear chromosomes. Analyses of plants harbouring the 
T-DNA have revealed that this genetic element may be integrated at 
numerous locations, and can occasionally be found within genes. One 
strategy which may be exploited to identify integration events within genes 
is to transform plant cells with specially designed T-DNA vectors which 
contain a reporter gene, devoid of dr-acting transcriptional and 
translation^ expression signals (ie. promoterless), located at the end of 
the T-DNA. Upon integration, the initiation codon of the promoterless 
gene (reporter gene) wm be juxtaposed to plant sequences. The 
consequence of T-DNA insertion adjacent to, and downstream of, gene 
promoter elements may be the activation of reporter gene expression. The 
resulting hybrid genes, referred to as T-DNA-mediated gene fusions, 
consist of unknown and thus un-characterized plant promoters residing at 
their natural location within the chromosome, and the coding sequence of 
a marker gene located on the inserted T-DNA (Fobert et al^ 1991, Plant 
MoL BioL 17, 837-851). 

It has generally been assumed that activation of promoterless or 
enhancerless marker genes result from T-DNA insertions within or 
immediately adjacent to genes. The recent isolation of several T-DNA 
insertional mutants (Koncz et al^ 1992, Plant MoL BioL 20, 963-976; 
reviewed in Feldmann, 1991, Plant J. l, 71-82; Van Lijsebettens et al^ 
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1991, Plant Sd 80, 27-37; Walden et al., 1991, Plant J. 1: 281-288; 
Yanofcky et al^ 1990, Nature 346, 35-39), shows that this is the case for at 
least some insertions. However, other possibilities exist One of these is 
that integration of the T-DNA activates silent regulatory sequences that 
are not associated with genes. Lindsey et al. (1993, Transgenic Res. 2, 33- 
47) referred to such sequences as "pseudo-promoters" and suggested that 
they may be responsible for activating marker genes in some transgenic 
lines. 



Inactive regulatory sequences that are buried in the genome but 
with the capability of being functional when positioned adjacent to genes 
have been described in a variety of organisms, where they have been 
called "cryptic promoters" (Al-Shawi et al n 1991, Mol CelL Biol 11, 4207- 
4216; Fourel et al^ 1992, Mol Cell Biol 12, 5336-5344; Irniger et al„ 1992, 
Nucleic Acids Res. 20, 4733-4739; Takahashi et al^ 1991,JpnJ. Cancer Res. 
82, 1239-1244). Cryptic promoters can be found in the introns of genes, 
such as those encoding for yeast actin (Irniger et al n 1992, Nucleic Acids 
Res. 20, 4733-4739), and a mammalian melanoma-associated antigen 
CTakahashieffl/,1991,/^/. Cawccri^ 82, 1239-1244). Ithasbeen 
suggested that the cryptic promoter of the yeast actin gene may be a relict 
of a promoter that was at one time active but lost function once the coding 
region was assimilated into the exon-intron structure of the present-day 
gene (Irniger et al., 1992, Nucleic Adds Res. 20, 4733-4739). A cryptic 
promoter has also been found in an untranslated region of the second 
exon of the woodchuck N-myc proto-oncogene (Fourel et al n 1992, Mol 
Cell Biol 12, 5336-5344). This cryptic promoter is responsible for 
activation of a N-myc2, a functional processed gene which arose from 
retropositon of N-myc transcript (Fourel et al n 1992, Mol Cell Biol 12, 
5336-5344). These types of regulatory sequences have not yet been 
isolated from plants. 
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This patent application describes, as an example, ne transgenic 
plant, T218, generated by tagging with a promoteriess GUS (fi- 
glucuronidase) T-DNA vector. This plant is of particular interest in that 
GUS expression was spatially and developmental^ regulated in seed coats 
and a promoter specific to this tissue has not been previously isolated. 
Cloning of the insertion site uncovered a cryptic promoter within a region 
of the tobacco genome not conserved among related species. This seed 
coat-specific promoter can be useful for controlling gene expression of 
selected genes to a specific stage of development 



Summary of Invention 

The present invention is directed to a cryptic promoter identified 
from Nicotiana tabacum (tobacco). Specifically this invention relates to a 
seed coat-specific cryptic promoter isolated from tobacco. 

The transgenic tobacco plant, T218, contained a 4.7 kb EcoBl 
fragment containing the 2.2 kb promoteriess GUS-nos gene and 25 kb of 
5> flanking tobacco DNA. Deletion of the region approximately between 
25 and 1.0 kb of the 5' flanking region did not alter GUS expression, as 
compared to the entire 4.7 kb GUS fusion. A further deletion to 05 kb of 
the 5' flanking site resulted in complete lose of GUS activity. Thus the 
region between 1.0 and 0.5 of the 5» flanking region of the tobacco DNA 
contains the elements essential to gene activation. This region is 
contained within a Xbal . SnaBl restriction site fragment of the flanking 
tobacco DNA. 

Thus according to the present invention there is provided a seed 
coat-specific cryptic promoter in tobacco contained within a DNA 
sequence, or analogue thereof, as shown in Figure 6. 



•A- 

Further according t the present invention, there is provided a 
DNA sequence, or analogue thereof as shown in Figure 6. 

This invention also relates to a cloning vector containing a seed 
coat-specific cryptic promoter from tobacco, which is contained within a 
DNA sequence, or analogue thereof as shown in Figure 6 and a gene 
encoding a protein. 

This invention also includes a plant cell which has been transformed 
with a cloning vector as described above 

This invention further relates to a transgenic plant containing a 
seed-coat specific promoter, operatively linked to a gene encoding a 
protein. 



Brief Description of the Drawings 

Figure 1 depicts the fluorogenic analyses of GUS expression in the 
plant T218. Each bar represents the average ± one standard deviation of 
three samples. Nine different tissues were analyzed: leaf (L), stem (S), 
root (R), anther (A), petal (P), ovary (O), sepal (Se), seeds 10 days post 
anthesis (SI) and seeds 20 days post-anthesis (S2). For all measurements 
of GUS activity, the fraction attributed to intrinsic fluorescence, as 
determined by analysis of untransformed tissues, is shaded black on the 
graph. Absence of a black area at the bottom of a histogram indicates 
that the relative contribution of the background fluorescence is too small 
to be apparent 

Figure 2 shows the cloning of the GUS fusion in plant T218 
(pT218) and construction of transformation vectors. Plant DNA is 
indicated by the solid line and the promoterless GUS-rcw gene is indicated 
bytheopenbox. The transcriptional start site and presumptive TATA box 
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are located by the closed and open arrow heads respectively. DNA probes 
#1, 2, 3 and RNA probe #4 are shown. The EcoRl fragment in pT218 
was subcloned in the pBINl9 polylinker to create pT218-l. Fragments 
truncated at the Xbal SndBl and Xbal sites were also subcloned to create 
pT218-2, pT218-3 and pT218-4. Abbreviations for the endonudease 
restriction sites are as follows: EcoRl (E), Hindm (H), Xbal (X), SnaBI 
(N), Smal (M), Sstl (S). 

Figure 3 shows the expression pattern of promoter fusions during 
seed development GUS activity in developing seeds (4-20 days 
postanthesis (dpa)) of (Fig. 3a) plant T218 (•-•) and (Fig. 3b) plants 
transformed with vectors pT218-l (OO), pT218-2 (□-□), pT218-3 (V-V) and 
pT218-4 (A-A) which are illustrated in Figure 2. The 2 day delay in the 
peak of GUS activity during seed development, seen with the pT218-2 
transformant, likely reflects greenhouse variation conditions. 

Figure 4 shows GUS activity in 12 dpa seeds of independent 
transformants produced with vectors pT218-l (O), pT218-2 (□), pT218-3 
(y) and pT218-4 (A). The solid markers indicate the plants shown in 
Figure 3 (b) and the arrows indicate the average values for plants 
transformed with pT218-l or pT218-2. 

Figure 5 shows the mapping of the T218 GUS fusion termini and 
expression of the region surrounding the insertion site in untransformed 
plants. 

(Fig. 5a) Mapping of the GUS mRNA termini in plant T218. 

The antisense RNA probe from subclone #4 (Figure 
2) was used for hybridization with total RNA of 
tissues from untransformed plants (10 pg) and from 
plant T218 (30 /tg). Arrowheads indicate the 
anticipated position of protected fragments if 
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transcripts were initiated at the same sites as the T218 
GUS fusion. 

(Fig. 5b) RNase protection assay using the antisense (relative 
to the orientation of the GUS coding region) RNA 
probe from subclone e (Figure 7) against 30 ng total 
RNA of tissues from untransformed plants. 

P, untreated RNA probe; control assay using the probe and tRNA 
only; L, leaves from untransformed plants; 8, 10, 12, seeds from 
untransformed plants at 8, 10, and 12 dpa, respectively; T10, seeds of plant 
T218 at 10 dpa; +, control hybridization against unlabeled in vitro- 
synthesized sense RNA from subclone c (panel a) or subclone e (panel b). 
The two hybridizing bands near the top of the gel are end-labeled DNA 
fragment of 3313 and 1049 bp, included in all assays to monitor losses 
during processing. Molecular weight markers are in number of bases. 

Figure 6 provides the nucleotide sequence of pT218 (top line) and 
pIS-1 (bottom line). Sequence identity is indicated by dashed lines. The 
T-DNA insertion site is indicated by a vertical line after bp 993. This site 
on pT218 is immediately followed by a 12 bp filler DNA, which is followed 
by the T-DNA. The first nine amino acids of the GUS gene and the GUS 
initiation codon (•) are shown. The major and minor transcriptional start 
site is indicated by a large and small arrow, respectively. The presumptive 
TATA box is identified and is in boldface. Additional putative TATA and 
CAAT boxes are marked with boxes. The location of direct (1-5) and 
indirect (6-8) repeats are indicated by arrows. 

Figure 7 shows the base composition of region surrounding the 
T218 insertion site cloned from untransformed plants. The site of T-DNA 
insertion in plant T218 is indicated by the vertical arrow. The position of 
the 2 genomic clones pIS-1 and pIS-2, and of the various RNA probes (a- 
e) used in RNase protection assays are indicated beneath the graph. 
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Figure 8 shows the Southern blot analyses of the insertion site in 
Nkotiana species. DNA from N. tomeruosiformis (N torn), N. syhestris (N 
syl), and N. tabacum (N tab) were digested with HinSm (H), Xbal (X) and 
£coRI (E) and hybridized using probe #2 (Figure 2). Lambda ij&idm 
markers (kb) are indicated. 

Figure 9 shows the AT content of 5' non-coding regions of plant 
genes. A program was written in PASCAL to scan GenBank release 75.0 
and to calculate the AT contents of the 5' non-coding (solid bars) and the 
coding regions (hatched bars) of all plant genes identified as 
"Magnoliophyta" (flowering plants). The region -200 to -1 and +1 to +200 
were compared. Shorter sequences were also accepted if they were at 
least 190 bp long. The horizontal axis shows the ratio of the AT content 
(%). The vertical axis shows the number of the sequences having the 
specified AT content ratios. 



Detailed Description of the Preferred Embodiments 

T-DNA tagging with a promoterless ^-glucuronidase (GUS) gene 
generated a transgenic Nicotian tabacum plant that expressed GUS 
activity onfy in developing seed coats. Cloning and deletion analysis of the 
GUS fusion revealed that the promoter responsible for seed coat 
specificity was located in the plant DNA proximal to the GUS gene. 
Deletion analyses localized the cryptic promoter to an approximately 0.5 
kb region between a Xbal and a SnaBl restriction endonudease site of the 
5* flanking tobacco DNA This region spans from nucleotide 1 to 
nucleotide 467 as shown in Figure 6. 

Thus, the present invention includes a DNA sequence comprising 
the seed coat-specific cryptic promoter from tobacco and analogues, 
thereof. Analogues of the cryptic promoter include any substitution, 
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deletion, or additions of the region, provided that said analogues maintain 
the seed coat- specific expression activity. 

The term cryptic promoter means a promoter that is not associated 
with a gene and thus does not control expression in its native location. 
These inactive regulatory sequences are buried in the genome but are 
capable of being functional when positioned adjacent to a gene. 

The DNA sequence of the present invention thus includes the DNA 
sequence of as shown in Figure 6, the promoter region within the sequence 
as shown in Figure 6 (for example from nucleotide 1 to 476), and 
analogues thereof. Analogues include those DNA sequences which 
hybridize under stringent hybridization conditions (see Maniatis et aJ„ in 
Molecular Cloning (A Laboratory Manual), Cold Spring Harbor 
Laboratory, 1982, p. 387-389) to the DNA sequence as shown in Figure 6, 
provided that said sequences maintain the seed coat-specific promoter 
activity. An example of one such stringent hybridization conditions may be 
hybridization at 4XSSC at 65°Q followed by washing in 0.1XSSC at 65°C 
for an hour. Alternatively an exemplary stringent hybridization condition 
could be in 50% formamide, 4XSSC at 42 8 C Analogues also include 
those DNA sequences which hybridize to the sequence as shown in Figure 
6 under relaxed hybridization conditions, provided that said sequences 
maintain the seed coat-specific promoter activity. Examples of such non- 
hybridization conditions includes hybridization at 4XSSC at 50°C or with 
30-40% formamide at 42°C 

There are several lines of evidence that suggest that the seed coat- 
specific expression of GUS activity in the plant T218 is regulated by a 
cryptic promoter. The region surrounding the promoter and 
transcriptional start site for the GUS gene are not transcribed in 
untransformed plants. Transcription was only observed in plant T218 
when T-DNA was inserted in as. DNA sequence analysis did not 
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a long open reading frame within the 33 fcb region cloned. Moreover, the 
region is very AT rich and predicted to be noncoding (data not shown) by 
the Fickett algorithm (Fickett, 1982, Nucleic Acids Res. 10, 5303-5318) as 
implemented in DNASIS 7.0 (Hitachi). Southern blots revealed that the 
insertion site is within the N. tomentosiformis genome and is not conserved 
among related species as would be expected for a region with an important 
gene. 

As this is the first report of a cryptic promoter in plants, it is 
impossible to estimate the degree to which cryptic promoters may 
contribute to the high frequencies of promoterless marker gene activation 
in plants. It is interesting to note that transcriptional GUS fusions in 
Arabidopsis occur at much greater frequencies (54%) than translational 
fusions (1.6%, Kertbundit et o/ n 1991, Proa Natl Acad. ScL USA 88, 
5212*5216). The possibility that cryptic promoters may account for some 
fusions was recognized by Iindsey et al. (1993, Transgenic Res. 2, 33-47). 

The results disclosed herewith confirms others (Gheysen et at, 1987, 
Proa NatL Acad. ScL USA 84, 6169-6173 and 1991, Genes Dev. 5, 287-297) 
that T-DNA may insert into A-T rich regions as do plant transposable 
elements (Capel et al^ 1993, Nucleic Adds Res. 21, 2369-2373). We 
illustrate that promoters of plant genes are also A-T rich raising 
speculation that gene insertions into these regions could facilitate the rapid 
acquisition of new regulatory elements during gene evolution. 

The insertion of functional genes into the nuclear genome and 
acquisition of new regulatory sequences has already played a major role in 
the diversification of certain genes and the endosymbiosis of organelles. In 
plants, most organellar proteins are nuclear encoded due to the ongoing 
transfer of their genes into the nucleus (Palmer, 1991, In Bogorad L and 
Vasil DC (eds) The Molecular Biology of Plastids, Academic Press, San 
Diego, pp 5-53). Recently, it has been shown that the car 2 gene of 
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cowpea (Nugent and Palmer, 1991, Cell 66, 473-481) and soybean (Covell 
and Gray, 1992, EMBOJ. 11, 3815-3820) were transferred from 
mitochondria to nucleus without promoters by RNA intermediates. The 
results disclosed herewith, with T-DNA-mediated gene fusions reveal the 
facility with which promoters can be acquired by incoming genes. The 
presence of cryptic promoters and diverse regulatory elements in the 
intergenic regions may insure that genes rapidly achieve the features 
needed to meet the demands of complex multicellular organisms. 

The cryptic promoter of the present invention can also be used to 
control to the expression of any given gene spatially and developmentally 
to developing seed coats. Some examples of such uses, which are not to 
be considered limiting, include: 



Modification of storage reserves in seed coats, such as starch 
by the expression of yeast invertase to mobilize the starch or 
expression of the antisense transcript of ADP-glucose 
pyrophosphorylase to inhibit starch biosynthesis. 

Modification of seed color contributed by condensed tannins 
in the seed coats by expression of antisense transcripts of the 
phenylalanine ammonia lyase or chalcone synthase genes. 

Modification of fibre content in seed-derived meal by 
expression of antisense transcripts of the caffeic acid-o- 
methyl transferase or dnnamoyl alcohol dehydrogenase 
genes. 



Inhibition of seed coat maturation by expression of 
ribonuclease genes to allow for increased seed size, and to 
reduce the relative biomass of seed coats, and to aid in 
dehulling of seeds. 
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5. Expression f genes in seed coats coding f r insecticidal 
proteins such as a-amylase inhibitor or protease inhibitor. 

6. Partitioning of seed metabolites such as glucosinolates into 
seed coats for nematode resistant 

Thus this invention is directed to such promoter and gene 
combinations. Further this invention is directed to such promoter and 
gene combinations in a cloning vector, wherein the gene is tinder the 
control of the promoter and is capable of being expressed in a plant cell 
transformed with the vector. This invention further relates to transformed 
plant cells and transgenic plants regenerated from such plant cells. The 
promoter and promoter gene combination of the present invention can be 
used to transform any plant cell for the production of any transgenic plant 
The present invention is not limited to any plant species. 

While this invention is described in detail with particular reference 
to preferred embodiments thereof, said embodiments are offered to 
illustrate but not limit the invention. 

EXAMPLES 

Characterization or a Seed Coat-Specific GUS Fusion 

Transfer of binary constructs to Agrobacterium and leaf disc 
transformation of Nicotiana tabacum SRI were performed as described by 
Fobert et al. (1991, Plant MoL Biol 17, 837-851). Plant tissue was 
maintained on 100 ng/ml kanamycin sulfate (Sigma) throughout in vitro 
culture. 

Nine-hundred and forty transgenic plants were produced. Several 
hundred independent transformants were screened for GUS activity in 
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developing seeds using the fluorogenic assay. One of these, T218, was 
chosen for detailed study because of its unique pattern of GUS expression. 

Fluorogenic and histological GUS assays were performed according 
to Jefferson (Plant MoL Biol Rep., 1987, 5, 387-405), as modified by 
Fobert et al. (Plant MoL Biol, 1991, 17, 837-851). For initial screening, 
leaves were harvested from in vitro grown plantlets. Later flowers 
corresponding to developmental stages 4 and 5 of Koltunow et al. (Plant 
Cell, 1990, 2, 1201-1224) and beige seeds, approximately 12-16 dpa (Chen 
et al^ 1988, EMBO J. 7, 297-302), were collected from plants grown in the 
greenhouse. For detailed, quantitative analysis of GUS activity, leaf, stem 
and root tissues were collected from kanamycin resistant Fl progeny of the 
different transgenic lines grown in vitro. Floral tissues were harvested at 
developmental stages 8-10 (Koltunow et al n 1990, Plant Cell 2, 1201-1224) 
from the original transgenic plants. Flowers of these plants were also 
tagged and developing seeds were collected from capsules at 10 and 20 
dpa. In all cases, tissue was weighed, immediately frozen in liquid 
nitrogen, and stored at -80°C 

Tissues analyzed by histological assay were at the same 
developmental stages as those listed above. Different hand-cut sections 
were analyzed for each orgaa For each plant, histological assays were 
performed on at least two different occasions to ensure reproducibility. 
Except for floral organs, all tissues were assayed in phosphate buffer 
according to Jefferson (1987, Plant MoL Biol Rep. 5, 387-405), with 1 mM 
X-Gluc (Sigma) as substrate. Flowers were assayed in the same buffer 
containing 20% (v/v) methanol (Kosugi et al H 1990, Plant ScL 70, 133-140). 

Tissue-specific patterns of GUS expression were only found in 
seeds. For instance, GUS activity in plant T218 (Figure 1) was localized in 
seeds from 9 to 17 days postanthesis (dpa). GUS activity was not detected 
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in seeds at ther stages of development or in any ther tissue analyzed 
which includ d leaf, stem, root, anther, ovary, petal and sepal (Figure 1). 
Histological staining with X-Ghic revealed that GUS expression in seeds at 
14 dpa was localized in seed coats but was absent from the embryo, 
endosperm, vegetative organs and floral organs (results not shown). 

The seed coat-specificity of GUS expression was confirmed with the 
more sensitive fluorogenic assay of seeds derived from reciprocal crosses 
with untransformed plants. The seed coat differentiates from maternal 
tissues called the integuments which do not participate in double 
fertilization (Esau, 1971, Anatomy of Seed Plants. New York: John Wiley 
and Sons). If GUS activity is strictly regulated, it must originate from GUS 
fusions transmitted to seeds maternally and not by pollen. As shown in 
Table 1, this is indeed the case. As a control, GUS fusions expressed in 
embryo and endosperm, which are the products of double fertilization, 
should be transmitted through both gametes. This is illustrated in Table 1 
for GUS expression driven by the napin promoter (BngNAPI, Baszczynki 
and Fallis, 1990, Plant MoL Biol 14, 633-635) which is active in both 
embryo and endosperm (data not shown). 
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Table 1. GUS activity in seeds at 14 days post anthesis. 



^TQSS GUS Activity 

a nmole MU/min/mg Protein 



T218 


T218 


1.09 ± 039 


T218 


wr 


3.02 ± 0.19 


wr 


T218 


0.04 ± 0.005 


wr 


wr 


0.04 ± 0.005 


NAP-S" 


NAP-5 


14.6 ± 7.9 


NAP-5 


wr 


3.42 ± 1.60 


wr 


NAP-5 


2.91 ± 1.97 



* WT, untransformed plants 

b Transgenic tobacco plants with the GUS gene fused to the 

napin, BngNAPl, promoter (Baszczynski and Fallis, 1990, Plant 
Mol Biol 14, 633-635). 

Cloning and Analysis of the Seed Coat-Specific GUS Fusion 

Genomic DNA was isolated from freeze-dried leaves using the 
protocol of Sanders et aL (1987, Nucleic Acid Res. 15, 1543-1558). Ten 
micrograms of T218 DNA was digested for several hours with £coRI using 
the appropriate manufacturer-supplied buffer supplemented with 15 mM 
spermidine. After electrophoresis through a 0.8% TAE agarose gel, the 
DNA size fraction around 4-6 kb was isolated, purified using the 
GeneClean kit (BIO 101 Inc, LaJolla, CA), ligated to phosphatase-treated 
£coRI-digested Lambda GEM-2 arms (Promega) and packaged in vitro as 
suggested by the supplier. Approximately 125,000 plaques were 
transferred to nylon filters (Nytran, Schleicher and Schuell) and screened 
by plaque hybridization (Rutledge et al^ 1991, Mol Gen. Genet 229, 
31-40), using the 3* (termination signal) of the rear gene as probe (probe 
#1, Figure 2). This sequence, contained in a 260 bp Sstl/EcoW restriction 
fragment from pPRF-101 (Fobert et o/„ 1991, Plant Mol Biol. 17, 837-851), 
was labeled with [o-^J-dCTP (NEN) using random priming (Stratagene). 
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After plaque purification, phage DNA was isolated (Sambrook et al^ 1989, 
A Laboratoiy Manual New York: Cold Spring Harbor Laboratory Press), 
mapped and subcloned into pGEM-4Z (Promega). The EcoRI fragment 
and deletions shown in Figure 2 were inserted into pBIN19 (Bevan, 1984, 
NucL Acid Res. 12, 8711-8721). Restriction mapping was used to 
determine the orientation of the fusion in pBIN19 and to confirm plasmid 
integrity. Plants were transformed with a derivative which contained the 5' 
end of the GUS gene distal to the left border repeat This orientation is 
the same as that of the GUS gene in the binary vector pBHOl (Jefferson, 
1987, Plant MoL Biol Rep. 5, 387-405). 

The GUS fusion in plant T218 was isolated as a 4.7 kb EcoRI 
fragment containing the 2.2kb promoterless GUS-nos gene at the T-DNA 
border of pPRF120 and IS kb of 5' flanking tobacco DNA (pT218, Figure 
2), using the nos 3' fragment as probe (probe #1, Figure 2). To confirm 
the ability of the flanking DNA to activate the GUS coding region, the 
entire 4.7 kb fragment was inserted into the binary transformation vector 
pBIN19 (Bevan, 1984, NucL Acid Res. 12, 8711-8721), as shown in Figure 
2. Several transgenic plants were produced by Agrobacterium-mediatcd 
transformation of leaf discs. Southern blots indicated that each plant 
contained 1-4 T-DNA insertions at unique sites. The spatial patterns of 
GUS activity were identical to that of plant T218. Histologically, GUS 
staining was restricted to the seed coats of 14 dpa seeds and was absent in 
embryos and 20 dpa seeds (results not shown). Fluorogenic assays of GUS 
activity in developing seeds showed that expression was restricted to seeds 
between 10 and 17 dpa, reaching a maximum at 12 dpa (Figure 3 (a) and 3 
(b)). The 4.7 kb fragment therefore contained all of the elements required 
for the tissue-specific and developmental regulation of GUS expression. 

To locate regions within the flanking plant DNA responsible for 
seed coat-specificity, truncated derivatives of the GUS fusion were 
generated (Figure 2) and introduced int tobacco plants. Deletion of the 
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region approximately between 25 and 1.0 kb, 5* of the insertion site 
(pT218-2, Figure 2) did not alter expression compared with the entire 4.7 
kb GUS fusion (Figures 3b and 4). Further deletion of the DNA, to the 
SndBl restriction site approximately 0.5 kb, 5* of the insertion site (pT218- 
3, Figure 2), resulted in the complete loss of GUS activity in developing 
seeds (Figures 3b and 4). This suggests that the region approximately 
between 1.0 and 0.5 kb, y of the insertion site contains elements essential 
to gene activation. GUS activity in seeds remained absent with more 
extensive deletion of plant DNA (pT218-4, Figures 2, 3b and 4) and was 
not found in other organs including leaf, stem, root, anther, petal, ovary or 
sepal from plants transformed with any of the vectors (data not shown). 

The transcriptional start site for the GUS gene in plant T218 was 
determined by RNase protection assays with RNA probe #4 (Figure 2) 
which spans the T-DNA/plant DNA junction. For RNase protection 
assays, various restriction fragments from pIS-1, pIS-2 and pT218 were 
subcloned into the transcription vector pGEM-4Z as shown in Figures 7 
and 2, respectively. A 440bp HindM. fragment of the tobacco 
acetohydroxyacid synthase SURA gene was used to detect SURA said 
SURB mRNA. DNA templates were linearized and transcribed in vitro 
with either T7 or SP6 polymerases to generate strand-specific RNA probes 
using the Promega transcription kit and [o-^JCTP as labeled nucleotide. 
RNA probes were further processed as described in Ouellet et al. (1992, 
Plant J. 2, 321-330). RNase protection assays were performed as described 
in Ouellet et al^ (1992, Plant J. 2, 321-330), using 10-30 ng of total RNA 
per assay. Probe digestion was done at 30°C for 15 min using 30 fig ml' 1 
RNase A (Boehringer Mannheim) and 100 units ml* 1 RNase Tl 
(Boehringer Mannheim). Figure 5 shows that two termini were mapped in 
the plant DNA. The major 5' terminus is situated at an adenine residue, 
122 bp upstream of the T-DNA insertion site (Figure 6). The sequence at 
this transcriptional start site is similar to the consensus sequence for plant 
genes (C/TTOATCA; Joshi, 1987 Nucleic Acids Res. 15, 6643-6653). A 
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TATA box consensus sequence is present 37 bp upstream f this start site 
(Figure 6). The second, minor terminus mapped 254 bp from the insertion 
site in an area where no obvious consensus motifs could be identified 
(Figure 6). 

The tobacco DNA upstream of the insertion site is very AT-rich 
(>75%, see Figure 7). A search for promoter-like motifs and scaffold 
attachment regions (SAR), which are often associated with promoters 
(Breyne et al^ 1992, Plant Cell 4, 463-471; Gasser and Laemmli, 1986, CeU 
46, 521-530), identified several putative regulatory elements in the first 1.0 
kb of tobacco DNA flanking the promoterless GUS gene (data not shown). 
However, the functional significance of these sequences remains to be 
determined. 



Cloning and Analysis or the Insertion Site from Untransformed Plants 

A lambda DASH genomic library was prepared from DNA of 
untransformed N. tabacum SRI plants by Stratagene for cloning of the 
insertion site corresponding to the gene fusion in plant T218. The 
screening of 500,000 plaques with probe #2 (Figure 2) yielded a single 
lambda clone. The £coRI and Xbal fragments were subcloned in pGEM- 
4Z to generate pIS-1 and pK-2. Figure 7 shows these two overlapping 
subclones, pIS-1 (3.0 kb) and pIS-2 (1.1 kb), which contain tobacco DNA 
spanning the insertion site (marked with a vertical arrow). DNA sequence 
analysis (using dideoxy nucleotides in both directions) revealed that the 
clones, pT218 and pK-1, were identical over a length of more than 25 kb, 
from the insertion site to their 5* ends, except for a 12 bp filler DNA 
insert of unknown origin at the T-DNA border (Figure 6 and data not 
shown). The presence of filler DNA is a common feature of T-DNA/plant 
DNA junctions (Gheysen et af„ 1991, Gene 94, 155-163). Gross 
rearrangements that sometimes accompany T-DNA insertions (Gheysen et 
of, 1990, Gene 94, 155-163; and 1991, Genes Dev. 5, 287-297) were not 
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found (Figure 6) and therefore could not account for the promoter activity 
associated with this region. The region f pIS-1 and pIS-2, 3' of the 
insertion site is also very AT-rich (Figure 7). 

To determine whether there was a gene associated with the pT218 
promoter, more than 33 kb of sequence contained with pIS-1 and pIS-2 
was analyzed for the presence of long open reading frames (ORFs). 
However, none were detected in this region (data not shown). To 
determine whether the region surrounding the insertion site was 
transcribed in untransformed plants, Northern blots were performed with 
RNA from leaf, stem, root, flower and seeds at 4, 8, 12, 14, 16, 20 and 24 
dpa. Total RNA from leaves was isolated as described in Ouellet et al n 
(1992, Plant J. 2, 321-330). To isolate total RNA from developing seeds, 
05 g of frozen tissue was pulverized by grinding with dry ice using a 
mortar and pestle. The powder was homogenized in a 50 ml conical tube 
containing 5 ml of buffer (1 M Tris HO, pH 9.0, 1% SDS) using a 
Polytron homogenizes After two extractions with equal volumes of 
phenolrchloroformasoamyl alcohol (25:24:1), nucleic adds were collected 
by ethanol precipitation and resuspended in water. The RNA was 
precipitated overnight in 2M lid at 0°C, collected by centrifugation, 
washed in 70% ethanol and resuspended in water. Northern blot 
hybridization was performed as described in Gottlob-McHugh et al. (1992, 
Plant Physiol 100, 820-825). Probe #3 (Figure 2) which spans the entire 
region of pT218 5* of the insertion did not detect hybridizing RNA bands 
(data not shown). To extend the sensitivity of RNA detection and to 
include the region 3' of the insertion site within the analysis, RNase 
protection assays were performed with 10 different RNA probes that 
spanned both strands of pIS-1 and pIS-2 (Figure 7). Even after lengthy 
exposures, protected fragments could not be detected with RNA from 8, 
10, 12 dpa seeds or leaves of untransformed plants (see Figure 5 for 
examples with two of the probes tested). The specific conditions used 
allowed the resolution f protected RNA fragments as small as 10 bases 
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(data not shown). Failure to detect protected fragments was not due to 
problems f RNA quality, as control experiments using the same samples 
detected acetohydroxyacid synthase {AHAS) SURA and SURB mRNA 
which are expressed at relatively low abundance (data not shown). 
Conditions used in the present work were estimated to be sensitive enough 
to detect low-abundance messages representing 0.001-0.01% of total 
mRNA levels (Ouellet et al^ 1992, Plant J. 2, 321-330). Therefore, the 
region flanking the site of T-DNA insertion does not appear to be 
transcribed in untransfonned plants. 



Genomic Origins or the Insertion Site 

Southern blots were performed to determine if the insertion site is 

conserved among Nkotiana species. Genomic DNA (5 ng) was isolated, 

digested and separated by agarose gel electrophoresis as described above. 

After capillary transfer on to nylon filters, DNA was hybridized, and 
probes were labeled, essentially as described in Rutledge et al. (1991, MoL 
Gen. Genet 229, 31-40). High-stringency washes were in 02 x SSC at 65°C 
while low-stringency washes were in 2 x SSC at room temperature. In 
Figure 8, DNA of the allotetraploid species N. tabacum and the 
presumptive progenitor diploid species N. tomentosiformis and N. syhestris 
(Okamuro and Goldberg, 1985, MoL Gen. Genet, 198, 290-298) were 
hybridized with probe #2 (Figure 2). Single hybridizing fragments of 
identical size were detected in N. tabacum and N. tomentosiformis DNA 
digested with JKrcdm, Xbal and EcoRl but not in N. sylvestris. 
Hybridizations with pIS-2 (Figure 8) which spans the same region but 
includes DNA 3' of the insertion site yielded the same results. They did 
not reveal hybridizing bands, even under conditions of reduced stringency, 
in additional Nkotiana species including N. rustica, N. giutinosa, N. 
megalosiphon and N. debneyi (data not shown). Probe #3 (Figure 2) 
revealed the presence of moderately repetitive DNA specific to the N. 
tomentosiformis genome (data not shown). These results suggest that the 
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rcgion flanking the insertion site is unique to the N. tomentosiformis 
genome and is not conserved among related species as might be expected 
for regions that encode essential genes. 

All scientific publications and patent documents are incorporated 
herein by reference. 

The present invention has been described with regard to preferred 
embodiments. However, it will be obvious to persons skilled in the art 
that a number of variations and modifications can be made without 
departing form the scope of the invention as described in the following 
claims. 
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1HE EMBODIMENTS OF THE INVENTION IN WHICH AN 
EXCLUSIVE PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED 
AS FOLLOWS: 

1. A seed coat-specific cryptic promoter from tobacco. 

2. The promoter of claim 1, contained within a DNA sequence, or 
analogue thereof, as shown in Figure 6. 

3. Hie promoter of claim 2, contained within a DNA sequence, or 
analogue thereof, from nucleotide 1 to nucleotide 467 as shown in Figure 
6. 

4. A DNA sequence, or analogue thereof; as shown in Figure 6, 
wherein said DNA sequence, or analog thereof; codes for a seed coat- 
specific promoter. 

5. The sequence of claim 4, or analogue thereof; from nucleotide 1 to 
nucleotide 467 as shown in Figure 6. 

6. A cloning vector which comprises a gene encoding a protein and a 
seed coat-specific cryptic promoter from tobacco, wherein the gene is 
under the control of the promoter and is capable of being expressed in a 
plant cell transformed with the vector. 

7. The vector of claim 6, wherein the seed coat-specific promoter is 
contained within a DNA sequence, or analogue thereof as shown in 
Figure 6. 

8. The vector of claim 7, wherein the seed coat-specific promoter is 
contained within a DNA sequence, or analogue thereof from nucleotide 1 
to nucleotide 467 as shown in Figure 6. 
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9. A plant cell which has been transformed with a vector as claimed in 
claim 6. 

10. A plant cell which has been transformed with a vector as claimed in 
claim 7. 

11. A plant cell which has been transformed with a vector as claimed in 
claim 8. 

12. A transgenic plant containing a promoter as claimed in claim 1, 
operatively linked to a gene encoding a protein. 

13. A transgenic plant containing a promoter as claimed in claim % 
operatively linked to a gene encoding a protein. 

14. A transgenic plant containing a promoter as claimed in claim 3, 
operatively linked to a gene encoding a protein. 
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Xbal l ? — ► 

1 I?T A ? A 9IT9I?Tni?TTT A 9 A T AA I9?I?TI9TT?rP TTTT:r 6TTAGTTTCTTCTGT 



61 TTT A T?9 AAAAAA CGAATTATTGATTAAGAAATACACCAGACAAGTTTTTTACTTCTTft 



121 T9TTTnTTTTTT§T99I AAAAAA TI A ? A CCTGGACAAGTTTATCACGAAAATGAAAATT 



181 

241 TAAAAAGTTTATACAGTTAGATCTCTCTATAACAGTCATCCnAnTATAACAATACrff 

r— » , r * ; " 

301 ACTATAACCGTCAAATTTATTTTGAAACAAAATTTTCAT 

« 6 

361 TTT A TT A T AG 9 AA ?? AAAAAA T A IC?AAACAGATACGATTG.TU^^ 
421 T^T?ATTATCCACATATTTTCGT^GCCCAATTACTCC 
481 CCAATTTAAAGTTGCAAAAATCCAATAGATTTCAATACT 

541 99T AA T9 A 9T?9TTTTT AA 9TTTT? A T?TTT AA TIT? AA §nT9TTI9 A TT AAAA ? AAA 9 

f—Xbal — ► kHyljr*** 
TIT9T A ? AA ? A 9 AA ?I5TTIT AA 9 A 9IT9T A6CTCTACTATTATCT 6 T 6TTTCTAGAAGA 



601 



781 TACA 



8 



661 AAAATAGAAAATGTGTCCACCTCAAAAACAACTAAAGGTG G5CAAATCT5 CArr.TATTTA 
721 TTTT A TTTTGG A TTAATTAAGATATAGTAAAGATCA GlfTATA^AT9 RAfiTTTTRAfiTTRA 



^T§T?T A ??GATTTAACTTTATTTACATTTATGTTTCGCACATA 
841 TAAGAAGTCCGATTTGGAAATACTAGATTTTGTC 

901 ATTTAAGTTATATACAATGATGATATAAAGAATTTTTAT 

™ A TT A 9T AAAAA IT A TT A TT9T A TT AA TTT A T99T A T9 1 SI^SIXX?? A ^6^ G f£8 A f 9 

1005 GCGGTACCCGGTGGTCAGTCCCTT AJG TJA CGT CfT GJA GAA ACC CCA ACC 



FIGURE 6 
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